PHP Internals News: Episode 80: Static Variables in Inherited Methods

PHP Internals News: Episode 80: Static Variables in Inherited Methods

In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the "Static Variables in Inherited Methods" RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:14

Hi I'm Derick, welcome to PHP internals news, the podcast dedicated to explain the latest developments in the PHP language. This is episode 80. In this episode I speak with Nikita Popov again about another RFC that he's proposing. Nikita, how are you doing today?

Nikita Popov 0:30

I'm still doing fine.

Derick Rethans 0:33

Well, that is glad to hear. So the reason why you saying, I'm still doing fine, is of course because we basically recording two podcast episodes just behind each other.

Nikita Popov 0:41

That's true.

Derick Rethans 0:42

If you'd be doing fine 30 minutes ago and bad now, something bad must have happened and that is of course no fun. In any case, shall we take the second RFC then, which is titled static variables in inherited methods. Can you explain what is RFC is meant to improve?

Nikita Popov 1:00

I'm not sure what this meant to improve, it's more like trying to fix a bug, I will say. This is a really, like, technical RFC for an edge case of an edge case, so I should say first, when I'm saying static variables, I'm not talking about static properties, which is what most people use, but static variables inside functions. What static variables do unlike normal variables, is that they persist across function calls. For example, you can have a counter static $i equals zero, and then increment it, and then each time we call the function, it gets incremented each time and the value from the previous call is retained. So that's just the context of what we're talking about.

Derick Rethans 1:43

Why would people make use of static variables?

Nikita Popov 1:46

I think one of the most common use cases is memoization.

Derick Rethans 1:50

Can you explain what that is?

Nikita Popov 1:51

If you have a function that that computes some kind of expensive result, but which is always the same, then you can compute it only once and store it inside the static variable, and then return it. Maybe possibly keyed by by the function arguments, but that's the general idea. And this also works if it's a free standing function. So if it's not in the method where you could store state inside the static property or similar, but also works inside a non method function.

Derick Rethans 2:22

The keyword here in his RFC's title is inherited methods, I suppose. What happens currently there?

Nikita Popov 2:29

There are a couple of issues in that area. The key part is first: How do static variables interact with methods at all? And the second part is how it interacts with inheritance. So first if you have an instance method, with a static variable, then some people expect that actually each object instance gets a separate static variable. This is not the case. The static variables are really bound to functions or methods, they do not depend on the object instance. Second problem is: What happens with inheritance? So you have a parent class with a method using static variables and then you have a child class that inherits this method. There are two ways you can interpret this, either this is still the same method, so it should use the same variables, or you could say okay the inherited method is actually a distinct method and should use separate variables. PHP currently follows the second interpretation.

Derick Rethans 3:24

And is this even the case, if it's overridden, or just when it's inherited? Because there's a difference there supposed as well.

Nikita Popov 3:30

Yeah, this is the basic model that PHP tries to follow but there are quite a few edge cases. The one that's what you mentioned if you override the method, then of course, you're calling the overridden method so the static variables don't even come in. But if you then call the parent method, then usually you would expect, if you do an override and just call the parent that the behaviour is exactly the same as if you didn't override it at all. That's not the case here, because now you're calling the parent method. So the problem you have here is that if you didn't override the method, then the child method and the parent method would have different static variables. Now if you call parent, then you're just calling the parent method, so you get back to one set of static variables, which are the same for both methods. You can see they're the same for both methods, but rather because you're calling the parent method, there is only one method involved, only one set of static variables involved. You can't really just like seamlessly extend a method that uses static variables, without changing the behaviour by accident.

Derick Rethans 4:37

This sounds all very complicated.

Nikita Popov 4:39

Yes I said, I did warn you that this is an edge case of an edge case.

Derick Rethans 4:44

But I think the whole idea behind the RFC is to make it less complicated.

Nikita Popov 4:48

Yes, this is like not the, the only issue you can run into. There is another one that we have actually addressed separately, but which still exists on earlier PHP versions, which is that the value of the static variables depend on the time of inheritance. Let me be a bit more explicit there. What we had in previous versions is that half your parent method was a static variables, then you call that parent method, static variables change, then you inherit it. And again, call the inherited method. In that order: first declare the parents, call it, declare the child, call it. In that case we will actually take the static variables at the time, where the inheritance actually happens. The first call onto the parent method modify the static variables, then we will use some modified variables. From that point on, it will have a separate copy, the child method, but it will like pick up these original modifications before inheritance happens. Now in PHP 8.1, we actually already fixed that so that we always use the original values, but this is like just one more thing to the list of weird things that happen, if you use static variables inside methods and you inherit them.

Derick Rethans 6:06

I think I understand more of it now.

Nikita Popov 6:08

I think you understand more than you ever wanted to know.

Derick Rethans 6:12

You've mentioned the edge cases. What is the result, going to be once this RFC passes, which I'm going to think is quite likely

Nikita Popov 6:21

The result is, hopefully, simpler than what we have, namely that static variables are really bound to a specific function or method declaration. If you have one method using static variables, then you have only one set of static variables ever. If it's inherited, you still reuse the same static variables because there is no separate inherited method. It's just the same method in the child class. That's the concept.

Derick Rethans 6:52

And if you override it in an inherited class?

Nikita Popov 6:55

If you override it, and you call the parent method, then the behaviour is unchanged because you still have just a single set of static variables, so there is no edge case here any more because the child, the child method never had a separate set.

Derick Rethans 7:10

But if the overridden method also defines its own static variable with the same name?

Nikita Popov 7:17

That's possible. In that case, once again this rule is that each method has its own static variables and methods can have static variables and the child method can have them as well, if they are overridden and there are no name clashes between them.

Derick Rethans 7:32

Because they are going to be totally separated, meaning that any code you run in the inherited methods will only affect its static variables, and any code that runs in the original methods only affects the static variables that are bound to that specific method.

Nikita Popov 7:50

Exactly. I mean, in the end, static variables are really the type of global state, just a type that is kind of isolated to a specific namespace and doesn't cause clashes, so in that sense, it's important that these things are isolated.

Derick Rethans 8:06

And that would also make the behaviour, a lot more easier to explain than it currently is. Because every methods, has its own set of static variables.

Nikita Popov 8:15

Yes.

Derick Rethans 8:15

Or I should say, every declared methods, has its own set of static variables.

Nikita Popov 8:20

I guess that is an important distinction. If you can see the methods inside your code and see the static variable inside it, then that is a distinct one. If we ignore the exception of traits.

Derick Rethans 8:34

You're going to have to explain that as well.

Nikita Popov 8:37

Well traits are always a special snowflake. Our general model for traits is that they are compiler assisted copy and paste. So a trait should roughly behave as if you just copied all the methods into the class that's using the trait. And in that sense, if you are actually copying the code of your method with a static variables, then it should also use a distinct set of static variables for each use. And that is also how it is proposed to behave. So that is like the one exception where you have a single method declaration in your code, but each using class will get a separate set of static variables.

Derick Rethans 9:15

Because the code is copied in place, instead of linked, or used in place. It's also the case for all the methods declared in traits, they're also copied into the same symbol table as the methods belong to a class.

Nikita Popov 9:32

Yeah, that's right.

Derick Rethans 9:33

Should be reference counted in some way because you probably won't duplicate the exact data.

Nikita Popov 9:38

We of course don't actually copy the methods, or at least most of the methods, but from the programmer perspective that's how it works

Derick Rethans 9:46

Why do you say most of the methods, and not all the methods?

Nikita Popov 9:50

We separate two things there. There is the method itself. So the op-array, and then there is all the stuff it uses like the opcodes the like arguments information and so on. What they do for traits, is we share all the data, and only create a separate op-array. Reason is that there are some differences. For example, we have to adjust the scope, we have to adjust possibly the function name if aliases are involved, and we have to adjust the static variables. So it's like kind of a partial copy we do.

Derick Rethans 10:22

Which is probably the most efficient way of doing it?

Nikita Popov 10:24

Yes.

Derick Rethans 10:25

Because this RFC is changing behaviour due to bug fixes, I would probably argue, what kind of backwards compatibility issues are there? And have you looked at how much code that actually impacts?

Nikita Popov 10:39

I haven't looked how much code it impacts because this seems like pretty hard to really analyse. I mean I guess something we could easily check is how much static variables are used in methods at all. But it would be hard to distinguish whether this change. I mean how to distinguish in a completely automated way. Whether this change makes behavioural difference for a particular use case or not. So I can't really give information on that, though I would expect that impact is relatively low because the common use cases, things like memoization, they aren't affected by it, or they are only affected by it in the sense that: Then you will memoize a value only once for the whole class hierarchy instead of memoizing it once for each like inherited class.

Derick Rethans 11:32

So, it's going to improve the situation there as well, is pretty much what you're saying?

Nikita Popov 11:36

Yeah, but I'm sure there are also cases where the previous behaviour was like intentionally used. I mean it was never documented, but you know if some behaviour exists, people will always make use of it in the end, but I can't really say exactly how much impact this would have.

Derick Rethans 11:55

Do you have anything else to add, discussing this RFC?

Nikita Popov 11:59

No, I think that's it.

Derick Rethans 12:00

Then I would like to thank you for taking the time today, again, to talk to me about static variables in inherited methods.

Nikita Popov 12:08

Thanks for having me once again.

Derick Rethans 12:16

Thank you for listening to this instalment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time.


PHP Internals News: Episode 79: New in Initialisers

PHP Internals News: Episode 79: New in Initialisers

In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the "New in Initialisers" RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:14

Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explain the latest developments in the PHP language. As you might have noticed, the podcasts are currently not coming out once every week, as there are not enough RFCs being submitted for weekly episodes. I suspect that this will change soon again. This is episode 79. In this episode, I speak with Nikita Popov, about a few more language additions that he's proposing. Nikita, how are you doing today?

Nikita Popov 0:43

I'm doing well, Derick, how are you doing?

Derick Rethans 0:45

I'm pretty good as though, I always much happier when it's sunny outside.

Nikita Popov 0:48

Yeah, for us to weather also turned today. Yesterday it was still cold.

Derick Rethans 0:53

We're here to talk about a few RFCs. The first one, titled "New in Initializers". What is this RFC about?

Nikita Popov 1:00

The context is that PHP has basically two types of expressions: ones, the ones used on normal code, and the other one in a couple of special places. These special places include parameter default values, property default values, constants, static variable defaults, and finally attribute arguments. These don't accept arbitrary expressions but only a certain subset. So we call those usually constant expressions, even though they are not maybe constant in the most technical sense. The differences really that these are evaluated separately so they don't run on the normal PHP virtual machine. There is a separate evaluator that basically evaluates an abstract syntax tree directly. They are just like, have different technical underpinnings.

Derick Rethans 1:49

Because it is possible to for example, define a default value to seven plus 12?

Nikita Popov 1:54

Exactly. It's possible to define it to seven plus 12, but maybe not to seven plus variable A, or seven plus some function call or something like that.

Derick Rethans 2:03

I guess the RFC is about changing this so that you can do things like this. What are you proposing to add?

Nikita Popov 2:09

Yes, exactly. So my addition is a very small one, actually. I'm only allowing a single new thing and that's using new, so you can use new, whatever, as a parameter default, property default, and so on.

Derick Rethans 2:23

In this new call, pretty much a constructor call for a class of course, can arguments to be dynamic, or do they need to be constant as well?

Nikita Popov 2:33

Rules are always recursive, so you can like pass arguments to your constructor, but they also have to follow the usual rules. So again, they also have to be constant expressions, or after this RFC, they can also include new, but they cannot include variable references and so on.

Derick Rethans 2:50

Is this something that is being defined at the grammar level or somewhere else?

Nikita Popov 2:55

We actually used to define that at the grammar level back in PHP five, but nowadays we just accept all expressions, and then we print you a bit nicer error message if it turns out it's not supported in this context.

Derick Rethans 3:08

And that happens when the AST is created.

Nikita Popov 3:10

Yeah, exactly.

Derick Rethans 3:11

The new syntax additions is to allow new in default values in places. The RFC talks a little bit about the evaluation order, and what is this and why is is important.

Nikita Popov 3:23

This is really the critical part. Right now, the kinds of things you can use in constant expressions are well as the name says, these can supposed to be constant, and not dynamic. This is not entirely true because, for example, you can like reference a class constant and referencing a class means that you call an autoloader. And that can run arbitrary code. Or you can trigger a notice that triggers an error handler and that also runs arbitrary code, like more like at the design level at the conceptual level, these really are constant expressions that are not supposed to have any side effects. Allowing new changes that in the big way, because new calls constructors and constructors can do whatever they want. I think it would be unusual to print out a string in the constructor, but some people might want to create a database connection there. The question of where exactly these expressions get evaluated becomes more important now, so we have to think about that a bit more carefully. Previously this was never really formally specified, how the evaluation works. There are some cases where it's pretty clear cut, for example, parameter default value of course gets evaluated when you call the function, and that parameter hasn't been passed. Similarly if you define a global constant, then the value is evaluated when you define the constant. And then there are the problematic cases, that I'm actually still not sure what to do about. The problematic cases are class constants, and static properties. Because the way these work right now is that they are evaluated lazily, so when you declare the class, they are not evaluated, and then later if you use the class, at least most uses of the class, they will be evaluated.

Derick Rethans 5:07

It would happened when you're done instantiate the class?

Nikita Popov 5:09

For example, yes. If you instantiate the class, if you add a static property and so on. But for example, not if you extend the class. In cases that like might potentially meet the evaluated initializers. The problem here is that, um, of course, if now your expressions can have side effects, then it's not great if you don't have like hard guarantee when it's actually going to be evaluated. On the other hand, what I actually implemented already, but I'm not sure it's a good idea is to change it to evaluate the expressions equally, so when you declare the class, we immediately evaluate all the static properties and constants.

Derick Rethans 5:47

I think I found a problem with that as well. For example, if one of the default values is doing new DateTime for example, if you rely on that happening when you instantiate the object, you will get a different time than when you declare a class.

Nikita Popov 6:01

I probably should have mentioned that explicitly. So when it comes to properties we'll always evaluate when you create the object. If you do a new DateTime then you will always get a new DateTime for each object, otherwise it wouldn't really make sense. The problem with the, with evaluation order for the static properties is, that if we evaluate immediately when they declare the class, then we can run into issues with dependencies. If you're using auto loading it's usually not a problem, but if you like declare classes manually, then you might have one class on using a constant from a class that you declare later on. Right now, that works fine, because the initializers are evaluated lazily, but if we evaluate them immediately then this is going to throw a fatal error because it says okay the class hasn't been declared yet. That's a backwards compatibility break, and there are also some other issues, for example with preloading, where it's not really clear when exactly we should be evaluating things in that context. So this is a point I'm undecided on, when things should evaluate. If we should stick with the current lazy evaluation or make it easier or possibly just limit the RFC, to not allow new inside static properties and class constants, because, at least for me personally, those are not the main use case. The three things that seem most important to me are parameter default, property default, I mean non static property default, and usage in attribute arguments.

Derick Rethans 7:29

When I was reading the RFC, I was realizing that if a constructor throws an exception, for example if it's a default argument to a method, then what would happen? Would the method fail, or, or would the method call fail or something else?

Nikita Popov 7:45

It would behave basically the same way as if you initialize the method argument, inside the method, you could do like you would do right now like with a null check maybe. So you would just get an exception thrown inside the method, and it would fail.

Derick Rethans 8:01

And is that the same if you would use it as the new new syntax as a default value to have constructor arguments as well?

Nikita Popov 8:09

Yeah, I think there's a situation would be the same, the one special case is if you use it as a property default, and then the instantiation fails, then we treat this basically the same way as the constructor having failed, which is a special situation, a slightly special situation, because we will also not call the destructor in that case. So we say that the object has been incompletely constructed so it will not get destructed.

Derick Rethans 8:38

And of course if this is a standard class property, then this happens on instantiation of the class. How would this work if it would be for a static property?

Nikita Popov 8:50

For a static property, well that, that depends again on the whole question of evaluation order. So for example the way things work right now, like without this proposal, is for example if you have a static property and you're referencing it constant that doesn't exist. Then, when you try to use the class you get an exception that okay undefined constant whatever. If you try to use it again, you still get the same exception, so you get this exception every time you use a class. This is what would happen on this case as well.

Derick Rethans 9:24

So it wouldn't happen on class declaration, but when you start using it?

Nikita Popov 9:29

Depending on where we evaluate. Either only on use or the first time on declaration and then afterwards in each use. If you still try to use it despite the declaration having failed, which is an odd thing to do but you would have to counter it somehow.

Derick Rethans 9:45

You know, if it is possible to do people will find a way how to do it.

Nikita Popov 9:48

Yes, certainly.

Derick Rethans 9:49

Can you talk a little bit about recursion protection as well because the RFC talks about that?

Nikita Popov 9:54

Well that's another edge case. So if you create an, have an, for example Class A with a property that has an initializer new A. That means when you create an object of class A, and try to initialize it you have to create another object, and then another object, and another, and we have to detect that situation, or we do detect that situation and for nice exception instead of, resulting in a stack overflow.

Derick Rethans 10:20

Which is beneficial.

Nikita Popov 10:21

Yes, because most people do not know what to do when people went PHP throws a segmentation fault, so they do prefer exceptions, usually.

Derick Rethans 10:30

I would too. The RC also talks about, there are some issues around traits which I didn't quite fully understand, would you mind explaining that to me?

Nikita Popov 10:38

The issue here is that traits can have properties and or rules. A rule is that if you have two traits, used in the same class, and declaring the same property, they have to be compatible. And compatible means effectively they have to be exactly the same. So, same visibility and same default value. The trouble here is that if we are dealing with an instance property, which has a new expression as a default value, then we have to somehow check that these are the same. It would be not great if we actually had to evaluate the initializer to do that because, I mean it's okay if it's just you know, with initializer something like one plus two, but if it's an actual new expression we don't want to create objects which again might have side effects and so on. What I'm specifying is that if you have a trait property with this kind of dynamic initializer, so using the new expression, than we will always consider it not compatible.

Derick Rethans 11:36

Would it currently be compatible with one of those trait properties, it says seven plus three, for example?

Nikita Popov 11:42

That will be compatible, which is actually, I think relatively new thing. We used to not evaluate initializers and traits at all, and say those are incompatible and that changed at some point and seven point, I don't know which version. But in this case we would go back to saying it's incompatible, because at least I don't see a good way to make it compatible and I don't think it's particularly important to support that case.

Derick Rethans 12:10

Do you have any information about how much traits are actually used?

Nikita Popov 12:15

Well, I know that Laravel uses them. But I have no idea how much.

Derick Rethans 12:22

One last thing I think RFC mentioned, is that it also has an effect on attributes, that it sort of gets nested attributes in by the back door. How does that work?

Nikita Popov 12:33

I wouldn't call it the back door. Exactly. I have to be honest, I didn't think about attributes at all when writing this proposal, what I had in mind is mainly parameter defaults, and property defaults. But yeah, attribute arguments also use the same mechanism and are under the same limitations. So now you can use new as an attribute argument. And this can be used to effectively nest attributes, so the example I've seen from Symfony is that they have, for example, assertions. They have an assert all attribute which has the which accepts, which wants to accept a list of assertion attributes. And now you can actually do that because you can, um, create these attribute objects recursively. The example from the RFC is assert all, then new assert not null, new assert length max six.

Derick Rethans 13:26

That's actually kind of neat, that is just ends up starting to work on right?

Nikita Popov 13:30

Yeah, I mean, I read the thread for Symfony how they are trying to work around that. They have various ideas of how to do it and it's all pretty ugly. So I think it's nice to have a more or less proper solution for that.

Derick Rethans 13:45

They'll just have to wait until PHP 8.1.

Nikita Popov 13:48

Yes, that is the disadvantage.

Derick Rethans 13:51

Out later this year.

Derick Rethans 13:53

Are there any backwards incompatible changes?

Nikita Popov 13:56

That again comes back to the evaluation order the problem. Originally I had intended to this, this to be compatible. Now if we change evaluation order then it is breaking, depending on that, the answer is yes or no, I am still not sure on that one.

Derick Rethans 14:11

Because I think PHP eight one already has a breaking changes in there where the order of declaration of properties is now different.

Nikita Popov 14:19

Yeah, that the change, though I hope that does not affect people too much because it's mostly about debugging functionality, which of course you are kind of interested in.

Derick Rethans 14:29

Yep, it broke my tests, which is a good thing because it means that my tests cover all the edge cases as well. I think we sort of done discussing this RFC, is there anything else that might ends up being added here in the future, or what still needs to be hammered out before you can put it up to vote?

Nikita Popov 14:47

Apart from the evaluation order question that I have been continuously mentioning, the future scope would be to extend this to not just new expressions, but also for example static method calls, popular alternative pattern is to not use constructors, but named constructors, which are implemented as static methods, and similarly also function calls for example so you can use something like strlen() or count inside an initializer.

Derick Rethans 15:13

Isn't strlen a language construct now?

Nikita Popov 15:15

No it isn't. It has an optimized implementation in the virtual machine, but it's still technically a normal function call.

Derick Rethans 15:23

Because I remember that, breaking tests in Xdebug as well at some point, because it suddenly didn't suddenly was no longer a function call.

Nikita Popov 15:30

Things do tend to break in Xdebug.

Derick Rethans 15:34

Okay, I'm used to it. Thank you, Nikita for taking the time to talk about your new in initializers RFC.

Nikita Popov 15:40

Thanks for having me.

Derick Rethans 15:45

Thank you for listening to this instalment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supports of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time.


PHP Internals News: Episode 78: Moving the PHP Documentation to GIT

PHP Internals News: Episode 78: Moving the PHP Documentation to GIT

In this episode of "PHP Internals News" I chat with Andreas Heigl (Twitter, GitHub, Mastodon, Website) to follow up with his project to move the PHP Documentation project from SVN to GIT, which has now completed.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:15

Hi, I'm Derick. Welcome to PHP internals news, the podcast dedicated to explaining the latest developments in the PHP language. This is Episode 78. In this episode, I'm talking with Andreas Heigl about moving the PHP documentation to GIT. Andreas, would you please introduce yourself?

Andreas Heigl 0:35

Hi yeah I'm Andreas, I'm working in Germany at a company doing PHP software development. I'm doing a lot of stuff in between, as well. And one of the things that I got annoyed, was always having to go through hilarious ways of contributing to the PHP documentation, every time I found an issue with that. So at one point in time, I thought why not move that to Git and, well, here we are.

Derick Rethans 1:07

Here we are five years later, right? Because we already spoke about moving the documentation to GIT back in 2019 and Episode 28. But now it has finally happened, so I thought it'd be nice to catch up and see what actually has changed and how we ended up getting here. Where would you want to start. What was the hardest thing to sort out in the end?

Andreas Heigl 1:27

Well the hardest thing in the end to sort out was people, as probably always in software development. The technical oddities and the technical bits and pieces were rather fast to solve. What really was taking a long time was, well for one thing, actually, consistently working on that. And on the other hand, chasing down people to actually get stuff done. Because one of the major things here was not the technical side but getting the bits and pieces of information together to get access to the different servers, to the infrastructure of the PHP ecosystem, and chasing down the people that want to help you is one thing, and then chasing down the people that actually can help you is a completely different one. That was for me the most challenging bit, getting actually, to know who can do what and getting, yeah in the end, getting access to the actual machines, the whole ecosystem is running on that was really heavy.

Derick Rethans 2:34

The System Administration of PHP.net systems is very fragmented. There's some people having access to some machines and some other people having access to other machines and yea it sometimes takes some time to track down where are all these bits actually run.

Andreas Heigl 2:51

One thing is getting tracking down, where the bits run, the other one is, there is an excellent documentation in the wiki, the PHP wiki, which in some parts is kind of outdated. The other thing is, if you don't actually address the different people themselves personally, it is like screaming into the void so you can can send an email to 15 different people that have access to a box, like someone needs to do this and this and this. And everyone kind of seems to think, yeah, someone else can do that. I just don't have the time at this point in time Things get delayed, so you're waiting for an answer for a week; you do some other stuff, so two weeks go into the lab four weeks go into the land, and suddenly two months have passed. You didn't receive an answer and oh wait a minute, there was this project with moving the documentation to GIT so perhaps I should have a look at that again.

Derick Rethans 3:44

So what has changed for people that want to contribute to the PHP documentation. Can you explain a little bit the difference between before and after?

Andreas Heigl 3:52

Before the documentation was moved everything was an SVN and there was generally there were two kinds of people. There were regular contributors that had an SVN account. They had the documentation on their machine. They could actually modify the documentation and just do an SVN commit, and everything was working smoothly, so that documentation was then actually built. We ran into some issues with that as well but that's a different story. Now, it is as everything is has moved to GIT, the sources are now on git.php.net. Before I come to that, what was it for people that did not have an SVN account. There was an awesome piece of technology on a web server called edit.php.net, which was a graphical user interface to the PHP documentation and to the translation so everyone could more or less log in there, with an anonymous account for example, modify the documentation, and create yeah well a kind of a pull request Create a patch that was then reviewed and could then be merged by people with SVN access. That awesome piece of technology was an awesome piece of technology when it was created some 12 years ago or something like that. It has not changed much in between. So it was still kind of working, not always, it was offline for some time. And the people that had access to the SVN were not really that responsive at all times so it could take some while for your patch to actually be merged in. So how is it now, now all the sources are on git.php.net, and there is a mirror for all translations, on https://github.com/php/, for example, doc-en for the English documentation. You have the repository, and you can create a pull request there. So you just move there, edit the file you want to move, create a pull request. And then the pull request will be merged into the main sources, again, by people with merge access. As this is a pull request that usually happens, a bit faster than before. We are still not at that point where we can create a GitHub action for that so that perhaps that can be completely automated after some technical things are resolved, like yes that does build, and we don't have any issues, technical issues with that. We could just build that automatically and merge that automatically into to the GIT sources. Those are possibilities that we now have, and from that point of view, now the contribution is much easier as we are using the technology that every one of us knows already.

Derick Rethans 6:34

The original translation to work with sub modules in SVN, now how is with the GIT approach?

Andreas Heigl 6:40

It actually didn't work with sub modules in SVN, it worked with, actually, one folder for the documentation, and then sub folders for the different translations. So there was a base folder PHP Doc, and within that base folder PHP Doc, there was a folder EN for the English base, kind of, and then DE for the German translation or PT_BR for the Brazilian Portuguese translation, or IT for the Italian translation. Or some other rather outdated translations that we actually didn't move over, so we only moved everything that was touched within the last two years, which brought some interesting bugs. Of course we actually moved something that was touched within the last two years, but that is not considered an active translation, and that caused some havoc.

Derick Rethans 7:30

Which translation was that?

Andreas Heigl 7:31

That was the Italian translation. So actually, there is no official Italian translation of the PHP documentation. But there is an Italian translation that is actually worked on, and hopefully at one point in time, we can actually promote that to a valid active translation, so that Italian people can actually see some Italian documentation for them. In GIT on the other hand, we have different repos, different repositories for the different languages. It is now, not possible to just check out phpdoc, and have every translation. Now you actually have to say, I want to check out the English documentation, and I want to check out the Italian documentation, and I want to check out the Japanese documentation, because I want to work on each of them. That has some disadvantages, especially for people that are working on multiple documentations or multiple translations. On the other hand, that also has advantages because you don't need to actually download all the translations that you're not at all interested in.

Derick Rethans 8:35

But that shouldn't be something new because I'm pretty sure that I've never checked out all the translations, even with SVN.

Andreas Heigl 8:41

SVN you could decide to only check out certain translations but if you check out the PHP doc base folder you would get all the documentation. Yes, there were some sub modules that actually did exactly that, like, if you check out the sub module for Italian for example you would get the English base and the Italian translation. That was all.

Derick Rethans 9:04

I remember we employed some SVN magic to do that kind of things, but I forgot the most about that because it's so long ago,

Andreas Heigl 9:11

Not really to worry about.

Derick Rethans 9:14

No.

Andreas Heigl 9:14

We're thinking about doing this similar thing for GIT now for the by using GIT sub modules, but we have not yet implemented that, because there were other pressing issues like getting the ref check documentation up and running, where you can actually see which files are outdated which files need to be translated and stuff like that. So that was more more pressing some other people have done, also work on that need to check what the current status is, to be honest, because I didn't check that. That was going on very strongly in January, after we moved between Christmas and New Year's Eve. After we equalized some glitches that happened during the whole process, because of Yeah, sometimes also processes that were nowhere really documented, and I got just got plain wrong. So then I had to invest some time and fix all that, but luckily that was during my holiday time so I had a lot of time for that so that was not an issue.

Derick Rethans 10:15

So working on the documentation during your holiday's huh?

Andreas Heigl 10:18

Yes, definitely.

Derick Rethans 10:20

That makes it different from travelling to visit family, because that is of course not something we could do to here.

Andreas Heigl 10:25

Exactly, though. Luckily, having a family at home was quite okay it was a nice change to actually be able to get away sometime, from seeing the same people over and over again.

Derick Rethans 10:38

Now the documentation has moved from SVN to GIT, and everybody can now finally forget all their SVN commands. But the documentation itself is still written in Docbook. XML as far as I understand. Are there any discussions going on about changing that to something, perhaps a bit more modern?

Andreas Heigl 10:58

Yes, there were a lot of discussions going on during the whole phase. I deliberately try to calm that down to not to too many things at once. The thing that I wanted to get going was moving the documentation from SVN to GIT. Just change the underlying source code repository, and not change anything else in the process, because that was already hard enough. Now that we have moved, it is easily possible to actually modify or move the documentation to some other toolage, whether that is markdown or ASCII doc or whatever. I don't care to be honest because that's someone else's job to do. In my opinion, Docbook is actually a pretty good format for that. Yes it is rather verbose for sure, but it allows you to create a lot of different documentations, because the HTML is not the only documentation that is created, there is also the possibility to create a PDF documentation or a CHM for Windows documentation, and stuff like that. I'm not 100% sure how that would work with a rather, with something like like markdown or ASCII doc or something like that.

Derick Rethans 12:12

There's different strengths in different formats. Markdown for example doesn't really allow you to link in between documents, so that's probably not very handy but there's like Pandoc, which is stuff that the Python project uses. It's all pretty much designed around restructured text and linking in between them and stuff like that, so I guess that could be a way forward. It is certainly a lot easier to use than Docbook XML, but of course Docbook XML was created for this kind of rich marker without laying things out kind of situations right.

Andreas Heigl 12:44

Yeah. The nice thing is actually that in with Docbook. What perhaps that is possible with other tools as well but in Docbook for example, you have one single file where all the links are located in, and every translation just links to this one file, so if a link changes for whatever reason, you can just modify that in one place and don't have to go through all the documentation. And, of course, leave half of the links unchanged, and broken, whatever. So there is a lot of stuff actually that is pretty cool. But as I said that's now up for discussion. If there is someone that actually wants to tackle that and move the documentation format to something else that is a different story. Go ahead, propose that to the internals mailing list, or to the documentation team, we'll see how it goes from there. The documentation itself, the source code itself, is hosted on a platform that we all understand, at least by now a bit better than SVN.

Derick Rethans 13:42

Even though it started out on CVS, just like PHP that's.

Andreas Heigl 13:46

Yes.

Derick Rethans 13:47

A long long time ago.

Andreas Heigl 13:49

I found the remaining stuff from the transfer from S from CVS to SVN, yes,

Derick Rethans 13:55

I am sure there's still some commits lurking somewhere for that.

Andreas Heigl 13:59

Oh yes, especially in the in the ref check, there is a lot of commented out code with yeah CV, in CVS we did it this way now we have to modify that for SVN. Yes, I'm pretty sure now we have some commits in there that modify the SVN stuff for making it usable with GIT.

Derick Rethans 14:16

Andreas, do you have anything else to add?

Andreas Heigl 14:19

In all it was an awesome experience for me. I got to know a lot of people, a lot of awesome people. That was really really insightful, and I'm really happy that I had the chance to do that, and the trust of the community was really amazing. If anyone wants to get into that stuff, kind of stuff, pick yourself something that the community needs, and go for it, and don't let yourself be derailed by unresponsive mailing lists, or just things not happening. It's not because people think you are stupid, or the task is stupid, it's just because everything is done by volunteers that just pays its price.

Derick Rethans 15:00

It certainly does. Thank you, Andreas for taking the time today to talk to me about moving to PHP documentation to GIT.

Andreas Heigl 15:07

Thank you very much. It was a pleasure to be here.

Derick Rethans 15:13

Thank you for listening to this instalment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next time.


PHP Internals News: Episode 77: fsync: Buffers All The Way Down

PHP Internals News: Episode 77: fsync: Buffers All The Way Down

In this episode of "PHP Internals News" I chat with David Gebler (GitHub) about his suggestion to add the fsync() function to PHP, as well as file and output buffers.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:13

Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is Episode 77. In this episode I'm talking with David Gebler about an RFC that he's written to add a new function to PHP called fsync. David, would you please introduce yourself?

David Gebler 0:35

Hi, I'm David. I've worked with PHP professionally among other languages as a developer of websites and back end services. I've been doing that for about 15 years now. I'm a new contributor to PHP core, fsync is my first RFC.

Derick Rethans 0:48

What is the reason why you want to introduce fsync into the PHP language?

David Gebler 0:52

It's an interesting question. I suppose in one sense, I've always felt that the absence of fsync and some interface to fsync is provided by most other high level languages, has always been something of an oversight in PHP. But the other reason was that it was an exercise for me in familiarizing myself with PHP's core getting to learn the source code, and it's a very small contribution, but it's one that I feel is potentially useful, and it was easy for me to do as a learning exercise.

Derick Rethans 1:16

How did you find learning about PHP's internals?

David Gebler 1:19

Quite the roller coaster. The PHP internals are very arcane I suppose I would say, it's it's something that's not particularly well documented. It's quite an interesting challenge to get into it. I think a lot of it you have to pick up from digging through the source code, looking at what's already been done, putting together the pieces, but there is a really great community on the internals list, and indeed elsewhere online, and I found a lot of people very helpful in answering questions and again giving feedback when I first opened my initial proof of concept PR

Derick Rethans 1:48

Did you manage to find room 11 on Stack Overflow chat as well?

David Gebler 1:52

I did not, no.

Derick Rethans 1:53

I'll make sure to add a link in the show notes and it's where many of the PHP core contributors hang out quite a bit.

David Gebler 2:00

Sounds good to know for the future.

Derick Rethans 2:02

I read the RFC earlier today. And it talks about fsync, but it also talks about flush, or f-flush. What is the difference between them and what does fsync actually do?

David Gebler 2:14

That's the question that will be on everyone's lips when they hear about this feature being introduced into the language, hopefully. What does fsync do and what does fflush do? To understand that we have to understand the concept of the different types of buffering, an application runs on a system. So we have the application or sometimes called the user space buffer, and we have the operating system kernel space buffer,

Derick Rethans 2:36

And we're talking about writing to file here right?

David Gebler 2:39

We're talking about writing to files with fflush and fsync yep. So fflush and fsync are both about getting data out to a file, and using it to push something out of the buffers. But there are two different types of buffers; we have the application buffers; we have the operating system buffers. What fflush does is it instructs PHP to flush its own internal buffers of data that's waiting to be written into a file out to the operating system. What it doesn't do is give us any guarantee that the operating system will actually write that data to disk, so the operating system has its own buffer as well. Computers like to be efficient, operating systems like to be efficient, they will save up disk writes and do them in the order they feel like at the time they feel like. What fsync does is it also instructs the operating system to flush its buffers out to disk, thus giving us some kind of better assurance that our data has actually reached disk by the point that function returns.

Derick Rethans 3:29

I really only know about Linux here, but I know on Linux that there's a journal in the file system or, in most of the file systems that it uses. Would have seen make sure the data ends up in a journal, are also committed as how the file system does it itself?

David Gebler 3:45

The manner in which fsync synchronizes is indeed dependent on the particular POSIX type operating system, and file system that it's running on. I think you're right in respect of modern Linux and ext4, that would ensure the journal was also updated and that the data was persisted to disk. Older versions may behave a little bit differently. And one thing to note with fsync is that you can't take it as a cast iron guarantee that the data has either basically been persisted to disk, or that the corresponding file system entries have been updated. It is the best assurance you can get that those things have happened, and by the point that function returns, again, in PHP implementation, you have a solid an assurance as you're going to get. Because that manner of synchronization is dependent on the underlying system. It can vary. Yeah Linux ext4 is probably your best bet with fsync, but one interesting thing to know if it does go into PHP and people start using it, is that when you call fsync on a file, if you want to ensure that the file system entries are properly updated as well, you should also call fsync or a handle to that file's containing directory. And of course on a Linux or POSIX type system you can do that you can you can fopen a directory, and you can then call a sync on that handle.

Derick Rethans 5:02

You mentioned that you can call fsync on a file handle. How's it different than calling just fsync without an argument? Is fsync without an argument just telling the operating system to flush its current buffers, and fsync on a file handle specifically to flush that specific file, and its metadata?

David Gebler 5:22

So in the PHP implementation of fsync, you must supply it with a stream resource that is plain File Stream. You can obviously at the PHP level give it some other kind of resource but it will return a warning if it's not able to convert that into a plain File Stream. Under the hood, when you're talking about C, underlying operating system level, fsync operates on a file descriptor so it doesn't even operate on a stream handle; it operates on the actual underlying number that represents the file at the operating system level.

Derick Rethans 5:51

So that is different from the Unix shell command called fsync.

David Gebler 5:55

It is different. Yep.

Derick Rethans 5:57

The RFC also talks about another function called fdatasync. How's that different from fsync itself?

David Gebler 6:03

I think that was a really good question because it's got two answers. It's got the theoretical answer and the practical answer. The practical answer is that these days, it most likely makes no difference at all. The theoretical answer is the fdatasync only pushed out the actual file data to a disk write, but it wouldn't necessarily update metadata about the file, such as the time it was last modified the data that might be stored in the file system about the file. The reality now is that most operating systems, we'll update both of those things at once. The reason they were separated out in the POSIX specification is because of course updating the metadata is another write, so a call to fsync could require two rights were call to fdatasync would only require one. I'm not aware of any operating system and file system now that I'm going to use that actually still treats them differently

Derick Rethans 6:57

In PHP, we try to make sure that functions that are implemented work exactly the same on operating systems, and you've already explained that fsync depends a little bit on how the file system handles the specific requests. Would fsync also work in a similar way on on Windows or OSX, or is it specifically meant for just Linux?

David Gebler 7:20

It's not specifically meant for just Linux. fsync itself is part of the POSIX specification. Strictly speaking, fsync as an operating system level API does not exist on Windows. Windows does have a similar API mechanism called flush file buffers, and in the RFC, and in the implementation attached to that RFC, on Windows, that's what fsync does, it's a wrapped call to flush file buffers. It has, in practice the same effect. OSX is a bit of a trickier one. fsync does exist on OSX I know. I'm not a user of Apple products myself but I can tell you what I know about OSX. fsync on OSX, it will sort of attempt to flush your file buffers out, but OSX itself will not guarantee that the disk buffers are cached. So we have another layer of buffering there. You have the application space buffering, we have the kernel space buffering, and we have the hardware disk buffering itself. Physical drives will sometimes lie about having written data to disk. I mean USB drives are a notorious example of that. A USB drive will tell your operating system it's finished persisting data when it hasn't, and you can even see that on the little flashing LED on the drive, and if you pull it out your data will be corrupted. OSX is not so reliable. The interface exists, but whether it's actually worked or not is open to question because it may not have flushed the disk cache.

Derick Rethans 8:38

Are there any backward compatibility issues with this RFC or the implementation of this?

David Gebler 8:43

I'm pleased say there are no backwards compatibility issues at all. It's straightforwardly a new function that operates on plain file streams. It triggers a warning if you give it some kind of resource that can't be converted to a plain file descriptor - no consequence to not using it. You just get the same behaviour you've had in previous PHP versions. It's just a new optional function.

Derick Rethans 9:06

What has the feedback to introducing fsync and fdatasync been so far?

David Gebler 9:11

When I originally proposed the RFC, I had a bit of feedback on the internals list and around some of the other aspects of the PHP community that I reached out to in various places. Some people didn't see a need for it, which is fair enough, but my answer to that would be the when we look at the history of PHP as a web first language, you can see why people might not have had much of a use case for something like fsync. I mean it only fits particular situations, which is where you want some kind of transaction or guarantee. Quite often what people were doing with PHP were web applications where there was a degree of volatility in file rights, that wasn't important to them; they didn't particularly care about it. What I would say now is that when we look at PHP eight, and the evolution of the PHP ecosystem. You're seeing that it is, it is such a versatile and performance general purpose programming language these days, that people are using it for all manner of tasks. Using it in micro service architectures for back end services, they're using it even for things like data science and machine learning, emerging industries like that, so it's got so many more applications now. Broadly the feedback I had was, for the most part, probably wouldn't take much more than a pull request for it to be accepted. I think it's a very non controversial RFC. It's a small feature to add in the form of a new function.

Derick Rethans 10:33

And that is very different than Introducing enumerations or Fibers, of course.

David Gebler 10:38

Both of which I'm looking forward to.

Derick Rethans 10:40

We spoke a little bit about file system buffers, once you do a write it up in the application buffers, with flush, you can flush that to the operating system buffers and fsync to the file system. But in PHP development, there's also other buffers, that if you output something to the screen on the command line that ends up directly on the terminal. When you do this when PHP runs in a web server environment it doesn't do that because there's more buffers in between right. There's PHP's own buffers, there's buffers you can configure, there is then web server buffers, and networking buffers. So it's buffers all the way down pretty much, isn't it? How does the buffering in PHP's output work, and what kind of things can you do with that?

David Gebler 11:21

When we look at buffering in PHP, this is very easy to get confused, because he says buffers all the way down. On one hand we may be talking about buffers at the file system level and application space and kernel space, and on the other side we're gonna be talking about these kinds of buffers that you've just mentioned. Again, this is something that PHP does primarily for performance and perhaps for a few other purposes in how it manages the application that is running. Buffering is a way of storing output, somewhere before it gets sent on to the web server and ultimately from the web server to a user's browser. As you say a web server has its own buffering as well and PHP provides a couple of functions by which you can also attempt to force output to the browser. So again we have much as we have fflush for files. We have flush for regular output that you're trying to send to a web server. And that function will flush the internal output buffers of PHP, and it will then try and flush your web server buffers. There's an interesting parallel here because much like with fsync, flush versus fsync, you can't necessarily guarantee with fflush what the operating system will do. Your data wanted received it into its own buffers. With PHP you can't necessarily get a guarantee that a web server will flush its own buffers when you flush your output there. Perhaps we need to invent some kind of fsync for the web server as well. Output buffering is something that you can configure in PHP and you can stack and nest output buffers as well, so that means from an application developer point of view, you are using some other components, some other bit of code in your PHP system that produces its own output. When I say output, I mean via the normal mechanisms you would write something to a browser in PHP, so things like echo at the simplest level. What you can do is you can use PHP's output buffer functions, which are all the ones that start with ob then an underscore, to capture that output and control what you do with it, instead of it being output to the browser; you can capture it into a string, you can manipulate it, you can discard it, you can throw it up the chain to be output yourself. That's what we would primarily use those buffers for in PHP, but output buffering is also something you can configure in the php.ini file. You can turn it off, you can set the buffer size. Do you have a little bit of fine tuning there that you can do.

Derick Rethans 13:39

Something of that just popped in my mind is that when you call the new fsync on a file resource handler, file resource in PHP are implemented with an underlying interface because streams. Is there another fall writing buffer in streams itself as well?

David Gebler 13:58

There is a buffer in the actual C library level when it comes to streams. That's going into the detail a little bit of what I was talking about earlier where you have user space buffers and kernel space buffers. The reason those things exist is to do with the way an operating system manages a computer and keeps everything safe. User space isn't able to access kernel space. It's about range of memory addresses in the computer; we're getting quite low level now in terms of how all this works. In the underlying implementation in PHP source code, we're using the C File Stream functions to write the streams, and that means that the actual data gets copied into a buffer in userspace. What fsync does is it instructs the kernel to make a copy of that data in its own memory space, and then push that out to disk. It's getting quite low low level when we get into those details, it has to do with how file access is managed in PHP. There is an even lower level of access, which is more suitable for very high throughput intensive IO operations, which isn't available in PHP at all. I'm not so familiar with how this works on Linux because the file systems are different, but I can tell you in Windows, if you if you want really intensive throughput, you don't want to be calling fsync or equivalent flush file buffers, you know, hundreds of 1000s of times per second. You want to use what we call unbuffered output, where you write directly at the block level to disk. It wouldn't be suitable to try and do something like that in PHP, because it's to higher level language. It's a very very fine level of control that you need to be able to do that kind of thing. But with that level of control you also have a high level of consequence if something goes wrong.

Derick Rethans 15:39

I think there's actually an extension in PECL, or there used to be one at least, I'm not sure but it's still maintained for newer PHP versions, called dio or d.i.o., standing for direct IO, which I think implements some of these features, but I don't quite remember but I thought that was file system related, or just only network related direct IO.

David Gebler 16:00

I think direct IO extension did have lower level file system access off the top of my head. It's been a long time since I looked at it, and I think it actually had the option to open a file in synchronized mode, which means every write is essentially implicitly calling fsync. What it didn't have was the ability to open file writes in unbuffered mode, which is probably because that's lower level, still I mean that requires you to literally know the block size of the file system you're writing to and to write in that, in that size,

Derick Rethans 16:30

Doing that from PHP goes a little bit too far, I suppose.

David Gebler 16:34

It does go too far, I think.

Derick Rethans 16:36

Then again, if you can open a file you can open a file device as long as you have permission. So, I guess, nothing stops you from at least on Linux opening /dev/sda whatever number it is, as long as you're running PHP as root and writes to it, but I don't think this is a wise thing to do.

David Gebler 16:52

Definitely something I'd urge people to try with caution. I mean, obviously on Linux you you do have this kind of extraordinary power from the combination of the user you're running a process as, coupled with the fact that Linux treats everything as a file, and then actually has some interesting implications for fsync, you can try compiling the branch that I've submitted in the PR for the RFC; compile it on Linux, call fsync on some different handles to things which aren't actually files, but which to the underlying Linux operating system look like files. Hopefully the implementation I've provided is robust enough that it should just return false when you try and fsync things that are not actually files.

Derick Rethans 17:28

What kind of things are you thinking of here, things like directories?

David Gebler 17:32

Directories are fine for fsync and you should actually be able to get a successful fsync on directories because they are literally part of the file system. It's fine on POSIX type systems to do that. I'm not actually sure whether you can do that on Windows or not, but I don't think it provides any particular benefit to fsync a directory on Windows because the underlying flash file buffers API works on the file level only, but on Linux you could try opening file handles to something like a Unix socket and try and fsync that. You should just get false in PHP land from that function because it knows that it can't convert that file handle into an actual file stream, and thus can't get a regular fsync on it.

Derick Rethans 18:12

Have you created a test case for that situation?

David Gebler 18:15

I have created some test cases for opening streams to things that are not files from PHP. I can't whether I've covered that particular one or not, but I have got a couple in there for things which are not files at the PHP level.

Derick Rethans 18:29

That's good to hear. When do you think he will be putting this up for a vote?

David Gebler 18:34

I'm planning to put this up for a vote later this week actually. The RFC has been open for a little while, I think it's been about three weeks since I announced it on internals, and obviously there hasn't been a huge amount by way of feedback or discussion on it lately. That doesn't particularly surprise me, it is not a particularly controversial thing to add. I think a lot of people probably don't have much feeling about it one way or the other. But then, I'm hopeful, people who vote will see that as a reason to include it.

Derick Rethans 19:00

Thank you David for explaining the fsync RFC and related topics to me today.

David Gebler 19:05

Well, thanks for having me on. It's been great talking to you. And of course to anyone listening who's interested in the RFC, do check it out on the PHP wiki. Do have a look at the discussion on internals. And if you're a voting member, don't forget to vote when I open it up to vote later this week.

Derick Rethans 19:21

Thank you for listening to this instalment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool, you can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next time.

Show Notes


PHP Internals News: Episode 76: Deprecate null, and Array Unpacking

PHP Internals News: Episode 76: Deprecate null, and Array Unpacking

In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about two RFCs: Deprecate passing null to non-nullable arguments of internal functions, and Array Unpacking with String Keys.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:14

Hi I'm Derick. Welcome to PHP internals news, a podcast dedicated to explain the latest developments in the PHP language. This is Episode 76. In this episode, I'm talking with Nikita Popov about a few more RFCs that he has been working on over the past few months. Nikita, would you please introduce yourself.

Nikita Popov 0:34

Hi, I'm Nikita. I work on PHP core development on behalf of JetBrains.

Derick Rethans 0:39

In the last few PHP releases PHP is handling of types with regards to internal functions and user land functions, has been getting closer and closer, especially with types now. But there's still one case where type mismatches behave differently between internal and user land functions. What is this outstanding difference?

Nikita Popov 0:59

Since PHP 8.0 on the remaining difference is the handling of now. So PHP 7.0 introduced scalar types for user functions. But scalar types already existed for internal functions at that time. Unfortunately, or maybe like pragmatically, we ended up with slightly different behaviour in both cases. The difference is that user functions, don't accept null, unless you explicitly allow it using nullable type or using a null default value. So this is the case for all user types, regardless of where or how they occur as parameter types, return values, property types, and independent if it's an array type or integer type. For internal functions, there is this one exception where if you have a scalar type like Boolean, integer, float, or a string, and you're not using strict types, then these arguments also accept null values silently right now. So if you have a string argument and you pass null to it, then it will simply be converted into an empty string, or for integers into zero value. At least I assume that the reason why we're here is that the internal function behaviour existed for a long time, and the use of that behaviour was chosen to be consistent with the general behaviour of other types at the time. If you have an array type, it also doesn't accept now and just convert it to an empty array or something silly like that. So now we are left with this inconsistency.

Derick Rethans 2:31

Is it also not possible for extensions to check whether null was passed, and then do a different behaviour like picking a default value?

Nikita Popov 2:40

That's right, but that's a different case. The one I'm talking about is where you have a type like string, while the one you have in mind is where you effectively have a type like string or null.

Derick Rethans 2:51

Okay.

Nikita Popov 2:52

In that case, of course, accepting null is perfectly fine.

Derick Rethans 2:56

Even though it might actually end up being different defaults.

Nikita Popov 3:01

Yeah. Nowadays we would prefer to instead, actually specify a default value. Instead of using null, but using mull as a default and then assigning something else is also fine.

Derick Rethans 3:13

What are you proposing to change here, or what are you trying to propose to change that into?

Nikita Popov 3:18

To make the behaviour of user land and internal functions match, which means that internal functions will no longer accept null for scalar arguments. For now it's just a deprecation in PHP 8.1, and then of course later on that's going to become a type error.

Derick Rethans 3:35

Have you checked, how many open source projects are going to have an issue with this?

Nikita Popov 3:40

No, I haven't. Because it's not really possible to determine this using static analysis, or at least not robustly because usually null will be a runtime value. No one does this like intentionally calling strlen with a null argument, so it's like hard to detect this just through code analysis. I do think that this is actually a fairly high impact change. I remember that when PHP 7.2, I think, introduced to a warning for passing null to count(). That actually affected quite a bit of code, including things like Laravel for example. I do expect that similar things could happen here again so against have like strlen of null is pretty similar to count of null, but yeah that's why it's deprecation for now. So, it should be easy to at least see all the cases where it occurs and find out what should be fixed.

Derick Rethans 4:35

What is the time frame of actually making this a type error?

Nikita Popov 4:38

Unless it turns out that this has a larger impact than expected. Just going to be the next major version as usual so PHP 9.

Derick Rethans 4:45

Which we expect to be about five years from now.

Nikita Popov 4:49

Something like that, at least if we follow the usual cycle.

Derick Rethans 4:52

Yes. Are there any other concerns for this one?

Nikita Popov 4:55

No, not really.

Derick Rethans 4:57

Maybe people don't realize it.

Nikita Popov 4:58

Yeah, possibly. You can't predict these things, I mean like, this is going to have like way more practical impact for legacy code than the damn short tags. But for short tags, we get 200 mails and here we get not a lot.

Derick Rethans 5:14

I think this low impact WordPress a lot.

Nikita Popov 5:17

Possibly but at least the thing they've been complaining about is that something throws error without deprecation, and now they're getting the deprecation so everyone should be happy.

Derick Rethans 5:28

Which is to be fair I think is a valid concern.

Nikita Popov 5:30

Yes, it is. I've actually been thinking if we should like backport some deprecations to PHP 7.4 under an INI flag. Not like my favourite thing to work on, but people did complain?

Derick Rethans 5:47

Which ones would you put in there?

Nikita Popov 5:48

I think generally some cases where things went from no diagnostics to error. I think something that's mentioned this vprintf and round, and possibly the changes to comparison semantics. I did have a patch that like throws a deprecation warning, when that changes and that sort of something that could be included.

Derick Rethans 6:12

I would say that if we were in January 2020 here, when these things popped up, then probably would have made sense to add these warnings and deprecations behind the flag for PHP seven four, but because we've now have done 15 releases of it, I'm not sure how useful this is now to do.

Nikita Popov 6:30

I guess people are going to be upgrading for a long time still. I don't know I actually not sure about how, like distros, for example Ubuntu LTS update PHP seven four. If they actually follow the patch releases, because if they don't, then this is just going to be useless.

Derick Rethans 6:48

Oh there's that. Yeah.

Derick Rethans 6:50

There is one more RFC that I would like to talk to you about, which is the array unpacking with string keys RFC. That's quite a mouthful. What does the background story here?

Nikita Popov 7:00

The background is that we have unpacking in calls. If you have the arguments for the call in an array, then you write the three dots, and the array is unpacked into actual arguments.

Derick Rethans 7:14

I'd love to call it the splat operator.

Nikita Popov 7:16

Yes, it is also lovingly called the splat operator. And I think it has a couple more names. So then, PHP 7.4 added the same support in arrays, in which case it means that you effectively merge, one array to the other one. Both call unpacking and array unpacking, at the time, we're limited to only integer keys, because in that case, are the semantics are fairly clear. We just ignore the keys, and we treat the values as a list. Now with PHP 8.0 for calls, we also support string keys and the meaning there is that the string keys are treated as parameter names. That's how you can like do a dynamic named parameter call. Actually, this probably was one of the larger backwards compatibility breaks in PHP eight. Not for unpacking but for call_user_func_arg, where people expected the keys to be thrown away, and suddenly they had a meaning, but that's just a side note.

Derick Rethans 8:21

It broke some of my code.

Nikita Popov 8:23

Now what this RFC is about is to do same change for array unpacking. So allow you to also use string keys. This is where originally, there was a bit of disagreement about semantics, because there are multiple ways in which you can merge arrays in PHP, because PHP has this weird hybrid structure where arrays are a mix between dictionaries and lists, and you're never quite sure how you should interpret them.

Derick Rethans 8:54

It's a difference between array_merge and plus, but which way around, I can ever remember either.

Nikita Popov 9:00

What array_merge does is for integer keys, it ignores the keys and just appends the elements and for string keys, it overwrites the string keys. So if you have the same string key one time earlier and again later than it takes the later one. Plus always preserves keys, before integer keys. It doesn't just ignore them, but also uses overriding semantics. The same is the other way round. If you have something in the first array, a key in the first array and the key in the second array, then we take the one from the first array, which I personally find fairly confusing and unintuitive, so for example the common use case for using plus is having an array with some defaults, in which case you have to, like, add or plus the default as the second operand, otherwise you're going to overwrite keys that are set with the defaults which you don't want. I don't know why PHP chose this order, probably there is some kind of idea behind it.

Derick Rethans 10:01

It's behaviour that's been there for 20 plus years that might just have organically grown into what it is.

Nikita Popov 10:07

I would hope that 20 years ago at least someone thought about this. But okay, it is what it is. So ultimately choice for the unpacking with string keys is between using the array_merge behaviour, the behaviour of the plus operator, and the third option is to just always ignore the keys and always just append the values. And some people actually argue that we should do the last one, because we already have array_merge and plus for the other behaviours. So this one should implement the one behaviour that we don't support yet.

Derick Rethans 10:40

But that would mean throwing away keys.

Nikita Popov 10:43

Yes. Just like we already throw away integer keys, so it's like not completely out there. So yeah, that is not the popular option, I mean if you want to throw away keys can just call array_values and go that way. So in the end, the semantics it uses is array_merge

Derick Rethans 10:58

The array_merge semantics are..

Nikita Popov 11:01

append, like ignore integer keys just append, and for string keys, use the last occurrence of the key.

Derick Rethans 11:07

So it overwrites.

Nikita Popov 11:08

It overwrites, exactly. Which is actually also the semantics you get if you just write out an array literal where the same key occurs multiple times. Unpacking is like kind of a programmatic way to write a function call or an array literal, so it makes sense that the semantics are consistent.

Derick Rethans 11:26

I think I agree with that actually, yes. Are there any changes that could break existing code here?

Nikita Popov 11:32

Not really because right now we're throwing an exception if you have string keys in array unpacking. So it could only break if you're like explicitly catching that exception and doing something with it, which is not something where we provide any guarantees I think. So generally I think that, removing an exception doesn't count as a backwards compatibility break.

Derick Rethans 11:55

I think that's right. Do you have anything else to add here?

Nikita Popov 11:59

No, I think that's a simple proposal.

Derick Rethans 12:02

Thank you, Nikita for taking the time to explain these several RFCs to me today.

Nikita Popov 12:07

Thanks for having me Derick.

Derick Rethans 12:11

Thank you for listening to this instalment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next time.


PHP Internals News: Episode 75: Globals, and Phasing Out Serializable

PHP Internals News: Episode 75: Globals, and Phasing Out Serializable

In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about two RFCs: Restrict Globals Usage, and Phase Out Serializable.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:14

Hi I'm Derick. Welcome to PHP internals news, a podcast dedicated to explain the latest developments in the PHP language. This is Episode 75. In this episode, I'm talking with Nikita Popov about a few RFCs that he has been working on over the past few months. Nikita, would you please introduce yourself?

Nikita Popov 0:34

Hi, I'm Nikita, I work at JetBrains on PHP core development and as such I get to occasionally, write PHP proposals RFCs and then talk with Derick about them.

Derick Rethans 0:47

The main idea behind you working on RFCs is that PHP gets new features not, you end up talking to me.

Nikita Popov 0:53

I mean that's a side benefit,

Derick Rethans 0:55

In any case we have a few to go this time. The first RFC is titled phasing out Serializable, it's a fairly small RFC. What is it about?

Nikita Popov 1:04

That finishes up a bit of work from PHP 7.4, where we introduced a new serialization mechanism, actually the third one, we have. So we have a bit too many of them, and this removes the most problematic one.

Derick Rethans 1:19

Which three Serializable methods or ways of doing things currently exist?

Nikita Popov 1:24

The first one, which doesn't really count is just what you get if you don't do anything, so just all the Object Properties get serialized, and also unserialized, and then we have a number of hooks, you can use to modify that. The first pair is sleep and wake up. Sleep specifies which properties you want to serialize so you can filter out some of them, and wake up allows you to run some code, after unserialization, so you can do some kind of fix up afterwards.

Derick Rethans 1:52

From what I remember, if you use unserialize, where does the wake up the constructor doesn't get called?

Nikita Popov 1:59

During unserialization the constructor, never gets called.

Derick Rethans 2:03

So wake up a sort of the static factory methods to re rehydrate the objects.

Nikita Popov 2:08

Exactly.

Derick Rethans 2:08

So that's number one,

Nikita Popov 2:10

Then number two is the Serializable interface, which gives you more control. Namely, you have to actually like return the serialized representation of your object. How it looks like is completely unspecified, you could return whatever you want, though, in practice, what people actually do is to recursively call serialize. And then on the other side when unserializing you usually do the same so you call unserialize on the stream you receive, and then populate your properties based on that. The problem with this mechanism is exactly this recursive serialization call, because it has to share state, with the main serialization. And the reason for that is that, well PHP has objects, or object identity. So if you use the same object in two places you really want it to be the same object and not two objects with the same content. Serializable has to be able to preserve that, and that requires that it runs in the middle of the unserialization.

Derick Rethans 3:14

Not sure if I follow that bit.

Nikita Popov 3:16

Well maybe it's not a hard requirement more like an issue with our serialization format that comes into play here. Way PHP implements this, is using back references. So at first unserializes an object and then later you can have like a pointer back to it, that says like, I want to use the same object as at position number, 10, or so. For these back references to work, we have to actually execute the serialization handler while unserializing because otherwise the offsets will no longer match. So we can just run this at the end of unserialization for example because then our offsets would be incorrect. And this is a big problem because it's not really safe to run code, during unserialization because things are partially initialized. To make these back references work, PHP has to actually store pointers to these objects. And if you somehow modify things in specific ways, then these pointers become invalid. They point to a memory that no longer exists, and a possibly exploitable crash. This is why we would like to get rid of this mechanism.

Derick Rethans 4:25

But of course, in order to get rid of things, we had to have a better way of doing things in place first, right, which came with PHP seven four.

Nikita Popov 4:32

That's right.

Derick Rethans 4:32

So that's number three.

Nikita Popov 4:34

That's number three. Number three is actually very similar to number one: two new magic methods, double underscore serialize and double underscore unserialize. Serialize returns an array, usually like an array of properties for example, and then unserialize populates the object from that array. In practice, this works very similar to the Serializable interface, just that you don't manually call serialize and unserialize, but PHP will do so on your behalf. So you just return an array or get an array, and PHP will integrate that into the like main serialization, and because it's left to PHP, PHP can control where these calls occur.

Derick Rethans 5:19

With sleep originally you only return the name of the properties. Whereas with this new interface you return the names of the properties but also their values.

Nikita Popov 5:30

That's right. The new mechanism, this, like, in practice, it serves as a replacement for the Serializable interface. But from a technical side it's really close to sleep and wake up, um, just that, as you said, instead of returning property names you return both names and values.

Derick Rethans 5:51

And this is now the recommended way of doing serialization.

Nikita Popov 5:54

Like the motivation is one problem was, what I mentioned the security problem. Maybe the thing that impacts users more commonly is that things like calling parent::serialize and parent::unserialize with the Serializable interface, usually doesn't do what you want. Again, due to these back references because, like, the calls get out of order, we should do the same thing with the magic methods, with the underscore underscore serialize and unserialize and you can safely call parent methods and compose serialization in that way.

Derick Rethans 6:29

That's our state of serialization right now. We haven't spoken about RFC, what are you proposing to do here?

Nikita Popov 6:34

The RFC proposes to get rid of the Serializable interface. And, like in a way that is a bit more graceful than just deprecating it outright. And the idea is that if you have code that is still compatible with PHP 7.3, where the new mechanism doesn't exist, you probably still want to use Serializable. So if we just deprecated out right that would be fairly annoying to have code that's compatible with PHP 7.3, and 8.1. So instead what we do is we only deprecate the case where you implement Serializable without implementing the new mechanism. If you implement both of them, then you're fine for now.

Derick Rethans 7:15

The new mechanism, the one we're introducing PHP 7.4, would overrides the PHP 7.3 one already anyway.

Nikita Popov 7:22

Exactly. So on PHP 7.3 you would end up using Serializable and PHP seven four and higher, you would be using the new mechanism. And then, at a later point in time we would actually also deprecate Serializable itself and then remove it, though, like based on mailing list response, some people at least didn't like the long timeline. I'm not exactly sure what the alternative is, so either to deprecate Serializable right away, or to later remove it without deprecation of the interface itself.

Derick Rethans 7:57

Yeah, from what I saw the, the long-term-ness of phasing it out. I think had mentioned that it finally got removed in PHP 10, which is potentially 10 years away right. If we following every five years with a new major release. But then in the end, it does have some merit making sure that people can move on without being left in the dark at some point right. What is your own preference?

Nikita Popov 8:22

My own preference is what I proposed. I would also be fine with, like say in PHP 8.1, we call the proposal so you only get a warning if you only implement Serializable without the new mechanism, and the PHP nine we could just drop Serializable entirely. I think that would not be, because then the only problem then would be if you have code that is competitive with PHP 7.3 and PHP 9.0. I am sure that code will exist ... pretty normal version range to have.

Derick Rethans 9:08

Yeah, I probably would agree with you there. When I read the RFC it also mentioned PDO. Why would it mention PDO?

Nikita Popov 9:15

This all is something I only found out while writing it's on there is a PDO fetch serialize flag, which automatically calls unserialize when fetching values. So I will not comment on the really dubious idea of storing serialized data in the database.

Derick Rethans 9:35

I mean, people would currently said that the alternative is to store JSON, in these columns as values.

Nikita Popov 9:40

That would still be better.

Derick Rethans 9:42

But it's still a serialized format?

Nikita Popov 9:44

But at least the way this flag is implemented is effectively broken, because it doesn't just call unserialize, the function; it calls unserialize on the Serializable interface. I have no idea how this was intended to be used in practice, because it's not compatible with, like the normal serialization of the class. In practise like everything I have found about this online is basically just that okay if this functionality is broken, you shouldn't use it.

Derick Rethans 10:15

So you have less concerns just removing that straight away, I suppose.

Nikita Popov 10:19

Yeah.

Derick Rethans 10:20

Do you have anything else out about serialization.

Nikita Popov 10:22

I think this proposal is a very simple one and we have actually talked, way too much about this.

Derick Rethans 10:29

Let's move on to the next RFC, which is titled Restrict Globals Usage. This title almost sounds worse than it is as it might imply that you want to get rid of the globals array altogether. But I bet that's not the case. And I also suspect that restricting the globals array is a lot more technical as a subject as it might seem.

Nikita Popov 10:49

That's right. So this is really, mostly motivated by internal concerns, and has hopefully not a great deal of impact on like practical usage. There are a couple motivations, so some of them are about semantics, so globals is a very magic variable, that does not follow the usual semantics of PHP a number of ways. In particular array are typically by value. In all other cases, they are by value, which means that if you modify, like if you copy an array and modify one copy, then the other one doesn't get modified, I mean it's a copy so obviously it doesn't get modified. For globals if that's not the case. If you make a copy of globals and you modify the copy, then the original array also gets modified.

Derick Rethans 11:36

Which is not the case for other super globals such as underscore get and underscore post.

Nikita Popov 11:41

The other super globals are a bit magic but not that magic. There are a couple of other concerns with edge cases, but I think the real motivation here is the internal concern. And that's how globals is implemented. PHP, normally, manages variables in functions and scripts, using so called compiled variables. And this works by well when the script is compiled we actually see all the variables with the used, at least all the variables that don't go through something like variable variables or globals or something like that. And we reserve a slot for each of these variables, so we can directly access it. We don't have to look up, like the variable by name, we just say this is variable number seven and we can directly access it, which is much much more efficient. The problem is, then if you have something that globals you want to both have this access by index, and access by name, and they do that by storing a pointer inside the globals array to the actual location of the variable. Yeah, so this is a very special concept. So we call this an indirect, a variable of indirect type, and it essentially occurs only inside the globals array, and for object properties. For object properties it happens for the same reason, so object properties are normally accessed by index, but if you do something like variable object dynamic object access, then we also have to look it up by name. There we do the same thing, so we have a like map from property names to values, and if the value is really stored inside an object property slot then we just store a pointer there. The thing with the objects is that this is like really an internal concern that's well encapsulated and doesn't leak into normal PHP code. That's not the case with globals because globals is on the surface just a normal array. So you can do everything with it, you do with a normal array you can pass it to functions. Like in theory, all the functions, need to deal with this special value type that says: okay actually this is not the value itself is just a pointer to the value. The way you do it is every time you access a value you check okay is this an indirect value; if it is, follow the pointer.

Derick Rethans 14:01

I have plenty of code in Xdebug for this.

Nikita Popov 14:04

So it's really a super simple operation to do, but you actually have to do it. And you have to do it absolutely everywhere, if you're being pedantic. In practice that just doesn't happen. In PHP's own code, in the standard library, the array functions are those do consistently handle this edge case. But if you like go further, even most bundled extensions, and certainly most third party extensions, they are not going to do this and if they don't either they just get some, like you know benign misbehaviour where it looks like array elements are missing, or you get a crash, because the type is simply not handled. Yeah, well that's not a great state to be in, because like pushing passing the globals array into something like array pop or something, is very weird operation to do. I don't know if ever, anyone has done that for purposes outside testing PHP. But to support it, we have to like handle this special case everywhere, which is not robust and also has a certain performance impact when it comes to low level operations. So we also have to do this check every time you access an array for example from normal PHP code The idea is to remove the special case. That's the motivation here.

Derick Rethans 15:23

What are you proposing to change?

Nikita Popov 15:26

One is if you just access variable in globals. So you write $GLOBALS[], some variable name. Then we treat that especially and compile it down to an access to this global variable. So it could be a read access, could be a write access, or anything else,

Derick Rethans 15:44

But it is something that happens, when PHP compiles scripts.

Nikita Popov 15:48

That's right. The second part is you can also access the globals array in a read-only way, so you can take the whole array, and for example, do a for each loop over it. And that continues to work. The part that doesn't work is to take the whole globals array and modify it in some way, for example, passing globals to array pop, which requires passing it by reference is going to throw an error.

Derick Rethans 16:13

At which state. Is that going to throw an error?

Nikita Popov 16:15

That's usually during compilation, but specifically for the case of by-reference passing it can't be detected at runtime, because we don't always know if it's a by-reference or by-value pass. But for most of the cases it's a compile time error. Maybe one particular case that's worth mentioning is that you also can do a foreach by-reference over it. So if you like want to loop over globals and modify entries while doing so the way to do it now would be to do by-value loop and then just again access specific elements in it, like access globals key or something. And the reason why this helps us is that we can just return, like when you access globals, we can actually return a copy of the array. We don't have to maintain these like indirect pointers which are only necessary to support modifications, we can just return a copy. That means we no longer have to deal with this edge case in most places, in the engine and in third party extensions,

Derick Rethans 17:15

Talking about third party extensions, the code that implements this RFC has already been merged into PHP eight one, but the moment you did that, tests in Xdebug started failing, because I read the globals array, but it doesn't seem like it exists any more now.

Nikita Popov 17:31

That's actually a good point. Globals, I would know view it as a like, more like a syntax construct, similar to variable variables, or even the $this variable. So this is also not a real variable. Globals is no longer added as an actual variable in the symbol table, which is directly compiled down to either an access to the specific global or returns a copy of the table. So for Xdebug you, I probably filter you you have to access the EG symbol table.

Derick Rethans 18:02

Yes, but it wasn't as simple as it seemed because this is a hash table, and no longer is that a full array, which means that all my logic code doesn't work with that. So I've decided that globals just no longer exists and stuff, which is what it logically is in PHP eight one anyway.

Nikita Popov 18:22

So that might actually be nice. So I know that, like code that does work with globals, like as an array, usually also always skips skips globals itself when iterating over it, because otherwise you usually run into some kind of infinite recursion issue. That's actually another thing, so globals is the one way you can have a recursive array, without references being involved. So I know that the Symfony like variable/cloner dumper. That goes for a lot of effort to detect cycles, like has some extra fun hacks to detect globals correctly for that reason, because usually you just take references but for globals that doesn't work.

Derick Rethans 19:09

Right, how much of an impact is this going to have to existing code?

Nikita Popov 19:12

So I like analysed the top composer packages and found, not a lot of usages. I don't remember the exact number, it was maybe five cases that break. That's not to say that it has no impact. I do know that PHPUnit eight point whatever, had such a globals use, which was fixed already because Sebastian Bergmann now, adds support for new PHP versions to PHPUnit eight and nine both. If you're using PHPUnit seven, then probably, it's no longer going to work for that reason. Of course, it also doesn't work for many other reasons, as well. Depending on which features to use, but I do know that you know sometimes if you're not using mocks, then you can often use old PHPUnit versions, but I think that's no longer going to work in this case.

Derick Rethans 20:04

It's something that users of PHP and PHPUnit, probably should start testing once the alpha and beta releases of PHP eight one start happening.

Nikita Popov 20:16

Right. I mean, I hope that it's not going to be a big issue. After all, this is minor PHP version. So we really shouldn't be introducing bad breaks, but at least the usage I've seen in open source project suggests that it should not be a big problem.

Derick Rethans 20:33

Excellent. As I've mentioned this RFC is already been merged. So I don't really have to ask about feedback, because it's irrelevant right now. It's already there.

Nikita Popov 20:44

Well, you could still have feedback afterwards.

Derick Rethans 20:48

Thank you, Nikita for taking the time to explain these several RFCs to me today.

Nikita Popov 20:52

Thanks for having me Derick.

Derick Rethans 20:57

Thank you for listening to this instalment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next time.


PHP Internals News: Episode 74: Fibers

PHP Internals News: Episode 74: Fibers

In this episode of "PHP Internals News" I talk with Aaron Piotrowski (Twitter, Website, GitHub) about an RFC that he is proposing to add Fibers to PHP.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:14

Hi I'm Derick. Welcome to PHP internals news, the podcast dedicated to explaining the latest developments in the PHP language. This is Episode 74. Today I'm talking with Aaron Piotrowski about a Fiber RFC, that he's working on together with Nicolas Keller. Aaron, would you please introduce yourself.

Aaron Piotrowski 0:33

Hi everyone I'm Aaron Piotrowski, I started programming with PHP back in 1998 with PHP three, so I've just dated myself there but, but I've worked with a lot of different languages over the last few decades but PHP is always continually remaining, one of my favourite and I'm always drawn back to it. I've gotten a lot more involved with the PHP projects since PHP seven. The Fiber RFC is my first major contribution I have attempted though. In the past I did the RFC for the throwable exception hierarchy. And the Iterable pseudo type in PHP 7.1.

Derick Rethans 1:12

Yeah, these things are both before I started doing the podcast so hence we haven't spoken yet, at least on here. We've actually met at some point in the past. I've had a read through the Fiber RFC this morning, but I'm still fairly baffled. Could you perhaps explain in short what Fibers are where the idea comes from. And what's your specific interest is in adding them to PHP?

Aaron Piotrowski 1:35

A few other languages already have Fibers like Ruby, and they're sort of similar to threads in that they contain a separate call stack, and a separate memory stack, but they differ from threads in that they exist only within a single process, and that they have to be switched to cooperatively by that process rather than actively by the OS like threads. So sometimes they're called Green threads, and the generators that are in PHP already are actually somewhat similar to Fibers; but generators differ in that they're stack less. And so what that means is that generator function can only be interrupted at one layer. Whereas a Fiber can be interrupted anywhere in the call stack. So like it'd be imagine if you had a generator where yield could be very deep in a function call. Rather than at the top level. Like, how generators can be used to make interruptible functions, Fibers can also be used to create similarly interruptible functions, but with again without having to know exactly when it's going to be interrupted not at the top level but at any point in the call stack. And so the main motivation behind wanting to add this feature is to make asynchronous programming in PHP much easier and eliminate the distinction that usually exists between async code that has used promises and synchronous code that we're all used to.

Derick Rethans 3:09

So what specifically are you proposing to ask to PHP here then?

Aaron Piotrowski 3:12

Specifically I'm looking at adding a low level Fiber API, that's really aimed specifically at async framework authors to create their own more opinionated API's on top of that low level API. So adds just a couple of classes: Fiber, and a FiberScheduler on within a couple of exception classes and reflection classes for inspecting Fibers. When a Fiber is suspended to the execution switch is to FiberScheduler, which is then a special Fiber, that's able to start and resume, regular user Fibers. So a Fiber scheduler is generally going to be something like an event loop that then, when a Fiber is suspended that our scheduler event loop will resume certain Fibers, like in response to events like data becoming available on a socket or like a timer expiring.

Derick Rethans 4:17

How would the event loop, decide which Fiber to resume, depending on on input, for example?

Aaron Piotrowski 4:24

It's largely up to, how like that framework choose to write that event loop, but in general, like when a Fiber is going to suspend it'll set up some sort of callback, or add it to like an array of Fibers that's waiting on events, and when execution switches to the event loop.

Derick Rethans 4:44

A Fiber before it suspends itself set up add another Fiber to the scheduler.

Aaron Piotrowski 4:48

That's exactly right. Before a Fiber suspends itself it adds itself to some sort of event in the event loop that when it triggers, it will resume that Fiber. So, if you're familiar with how some of the other async frameworks that work now they'll add something like a callback or promise to the event loop that's resolved. This is sort of working the same way except that it's just resuming a Fiber that of like invoke a callback, although you know that Fiber might be resumed and by invoking a callback.

Derick Rethans 5:23

Is the Fiber scheduler or the event loop as however if you want to call it, is that something you would, or can also make use of in a normal PHP applications, I mean by normal I mean, not like an async PHP framework? Is that the intention is all?

Aaron Piotrowski 5:39

Using Fibers than kind of helps you eliminate that boundary that that exists, trying to put asynchronous code into something using synchronous code because you end up with a promise that you have to await in that doesn't really work very well, where you're trying to mix it with sync code, Fibers eliminate the need for a promise. Since, the asynchronous function can still return types, you can mix async code into a traditional like sync application, even like something running in Nginx or Apache, it doesn't have to be a fully asynchronous app to make use now of some async I/O.

Derick Rethans 6:23

For example if he wants to do multiple database calls at the same time. Will you be able to use a uses for that?

Aaron Piotrowski 6:30

Exactly.

Derick Rethans 6:31

If a user would have a PHP application and want to use multiple database calls at the same time, how would, how would they set it up it with Fibers and Fibers scheduler?

Aaron Piotrowski 6:40

This is a low level API is aimed primarily at like a framework author. Generally if you're writing application with it, you're probably going to use one of those frameworks so it would largely depends on how that framework would set that up. Although, in general, those frameworks are going to provide some sort of abstraction for running code concurrently, that they probably have their own sort of placeholder object like like a promise again. So that when you start running things concurrently, they return to something that you can then wait on for all those things to end up, or when those things, complete executing. So it doesn't totally eliminate the need for promises, but it does allow for both to do not always that async to not always have to return a promise rather a promise is only required when you want concurrency, and that, you know, a framework will provide tools to await that can still be mixed in with synchronous code.

Derick Rethans 7:48

Do I understand this correctly that you won't need to promise unless you mix it with synchronous code?

Aaron Piotrowski 7:54

You won't need a promise unless you explicitly need concurrency.

Derick Rethans 7:57

Okay, that makes more sense I suppose.

Aaron Piotrowski 7:59

It's difficult to explain it's so much easier with examples.

Derick Rethans 8:03

Yeah but examples are very difficult to do in audio only.

Aaron Piotrowski 8:07

Yes, exactly. You have like a database query that returns a result. If you want to run multiple queries at the same time, the async library that uses Fibers underneath would be able to provide an abstraction that would allow you to run multiple queries at once. But that those two run concurrently would return a promise. But you would be able to collect those promises together, and use like a await function, provided by that async framework to then get the results of all of the queries at once.

Derick Rethans 8:47

You mentioned that Fibers are not threads, they just are more, they're sort of logical threads, but not physical threads in the same process. PHP isn't multi threaded, how would this work internally? What would have Fiber do or store, so that the scheduler can resume them for example? What is the internal mechanism, how does this interact with PHP itself.

Aaron Piotrowski 9:11

Each Fiber is allocated a C stack and a VM stack on the heap. So switching between them is similar to generators, when switching between Fibers the current VMs stack is swapped and the C stack is swapped, but it doesn't touch any of the other memory in the process, so things like globals are still accessible to each Fiber, since only one Fiber can be executing at the same time, you don't have some of the same race conditions that you have with threads of memory being accessed or written to by two threads at the same time. It can't happen with Fibers that you can have two Fibers that might be dependent on the same memory, and you may have to do some of the same sort of synchronization, that you have to do with threads to that memory if you don't want interleaving of Fibers to be potentially overriding that memory. That's the sort of thing that's being left again to like the async frameworks that would use this to provide that sort of mechanism over a low level Fiber API.

Derick Rethans 10:15

Of course when a Fiber is running, there's no need for locking anything because nothing runs at the same time anyway. And of course, when a Fiber suspense itself it then sort of knows that, well, I'm unlocking what I'm wanting to use of don't have this synchronization issue there.

Aaron Piotrowski 10:32

You don't have the synchronization issue where you have to worry that while while this Fiber is running, another Fiber might overwrite the same memory. But there is a potential that if a Fiber suspends that while it's suspended another Fiber could have overwritten some global memory, so if you're if you're sharing memory between Fibers it's best to use some sort of abstraction, like channels in Go to share data between Fibers rather than like a global. It could just be a global, it could even be like a class property or something, anything that you might share between two Fibers you could give the same object to two different Fibers, and those Fibers could modify that object. Well, I wouldn't recommend doing that, I would share that object over like a channel instead.

Derick Rethans 11:23

Your RFC doesn't talk about channels. So, I reckon that'd be something else that has to be implemented, probably with Fibers in the async framework.

Aaron Piotrowski 11:31

Exactly, yes.

Derick Rethans 11:32

What is your reason to want to others to PHP core instead of having it sitting in a PECL extension because I could argue that this isn't something that many PHP developers would ever use.

Aaron Piotrowski 11:43

I definitely see that point. I think that availability for being able to use that in any sort of application would be important for some reason there still seems to be a hesitation on certain platforms to install extensions. But more beyond that, there are reasons that you'd want to have it in core all the time, extensions that would want to profile code will need to be aware of Fibers. And if, if Fibers are an extension well then actually making use of it in a real application might be difficult because your code profilers don't work very well because they don't understand the Fiber switching. So that is one area that if this were merged into core, code profilers would probably have to be updated to account for that. There was also a bit of an issue in the extension right now that due to destructor order, how the shutdown logic goes. And what hooks are available in PHP, that if a registered shutdown function or a destructor suspends a Fiber, it might have to restart the scheduler unnecessarily. But if it were in core, I could avoid that. And then there's there's also issues with how to handle some of the global stacks that PHP provides when switching Fibers should those be reset, should they remain, but those are issues that can only be addressed if Fibers were part of the core rather than extension. Otherwise I have no choice but to just leave them as stacks that aren't switched.

Derick Rethans 13:22

Okay yeah that makes sense, because the stack switching is something that is trickier to do from an extension.

Aaron Piotrowski 13:28

Like the error handler, you know, how should that be handled. Should it be the error handler stack depends on which Fiber or should it remain just a constant global and I can't change that from an extension that would have to be part of core.

Derick Rethans 13:41

Because Fibers allow you to basically switch between threads. Have you had a look at how how debuggers, for example deal with this?

Aaron Piotrowski 13:50

In my testing with Xdebug, I didn't have any issue with inspecting execution stacks, or code coverage, that I will have to really defer to you. If you think that there's any anything that in Xdebug that would have to be updated or changed to accommodate. So far it's worked very well.

Derick Rethans 14:10

I know you submitted a bug report with a crash, but that's been fixed already, of course. What was that issue actually, I don't quite remember what it was?

Aaron Piotrowski 14:18

Something code coverage where I honestly don't really remember any more. It is invalid pointer for something.

Derick Rethans 14:26

It's an interesting thing that's with all these fancy extensions, and Fiber and not being the only, sometimes you run into things that extensions do something very strange that, then make things crash in Xdebug. I can't always test for that of course up front. I actually have a slightly related question that pops into my head here is like, there's also something called Swoole PHP, which does something similar, but from what I understand actually allows things to run in threads. How would you compare these two frameworks, or approaches is probably the better word?

Aaron Piotrowski 15:00

Swoole is, they try and be the Swiss Army knife, in a lot of ways where they provide tools to do just about everything and they provide a lot of opinionated API's for things that, in this case I'm trying to provide just the lowest level, just the only the very necessary tools that would be required in core to implement Fibers, I do believe Swoole implements Fibers as well. They use the term co-routine for their Fibers. I believe they actually use the same boost assembly language code that I used for swapping C stacks. I'm not sure if they provide actual threading as well. If they do, then that's great. Of course threading still requires a ZTS build of PHP. Fibers do not because it's still within one process.

Derick Rethans 15:55

I know that Swoole definitely doesn't work with Xdebug because the way how they do things, but it sounds like Fibers will actually work just fine.

Aaron Piotrowski 16:02

It seems so yes. I've used it already extensively with PhpStorm like setting breakpoints and things to debug. When I was upgrading some of the, the AMP libraries to figure out what was going wrong and it worked perfectly.

Derick Rethans 16:16

Are you involved with AMP.

Aaron Piotrowski 16:18

Yes, I am. One of the primary maintainers now along with Nicolas. I didn't start the library. The original author has moved on to other things, but it's it's pretty much just Nicolas and I doing most of it now. Bob still contributes occasionally as well.

Derick Rethans 16:38

And I guess that's why are you interested in having Fibers in PHP come from then?

Aaron Piotrowski 16:42

Yes, exactly.

Derick Rethans 16:44

What has the feedback been so far?

Aaron Piotrowski 16:47

Largely positive from the people that are more familiar with it. I haven't actually gotten a whole lot of feedback from the core contributors of PHP, so I'm not really sure where the proposal stands with them at the moment, but I guess maybe no feedback is good feedback if they had a problem with it somebody who's spoken up by now, I'm not sure.

Derick Rethans 17:09

That is often the case right, if it's if there is something to be added that is quite complicated, you get a lot less feedback. Then where there's something very simple like picking a name for function right.

Aaron Piotrowski 17:19

Yes, exactly.

Derick Rethans 17:21

When do you think your will be putting this up for a vote?

Aaron Piotrowski 17:24

I think I want to wait at least another month or so. I did make a recent change to how the Fiber scheduler API worked, and so I wanted to make sure that that people had time to review it. Maybe send another reminder email or two to internals, so that they, so that more people get a chance to look at it and play with it and provide feedback.

Derick Rethans 17:47

Somewhere around mid February?

Aaron Piotrowski 17:49

Something like that, yeah.

Derick Rethans 17:51

Did we miss anything discussing Fibers. Do you have anything to add yourself?

Aaron Piotrowski 17:55

No, I don't really think so. I think we covered the main points of it.

Derick Rethans 17:59

I have to say I understand that quite a lot better now, which is always good, and hopefully the people listening to this episode will also find it interesting and understand it well. So I would say thanks for explaining Fibers to me today.

Aaron Piotrowski 18:13

Yeah, thanks a lot for having me on.

Derick Rethans 18:18

Thank you for listening to this instalment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next time.


PHP Internals News: Episode 73: Enumerations

PHP Internals News: Episode 73: Enumerations

In this episode of "PHP Internals News" I talk with Larry Garfield (Twitter, Website, GitHub) about a new RFC that he is proposing together with Ilija Tovilo: Enumerations.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:14

Hi I'm Derick and welcome to PHP internals news that podcast dedicated to explain the latest developments in the PHP language.

Derick Rethans 0:22

This is Episode 73. Today I'm talking with Larry Garfield, who you might recognize from hits such as object ergonomics and short functions. Larry has worked together with Ilija Tovilo on an RFC titled enumerations, and I hope that Larry will explain to me what this is all about. Larry, would you please introduce yourself?

Larry Garfield 0:43

Hello World, I'm Larry Garfield, I am director of developer experience at platform.sh. We're a continuous deployment cloud hosting company. I've been in and around PHP for 20, some odd years now. And mostly as an annoying gadfly and pedant.

Derick Rethans 1:00

Well you say that but in the last few years you've been working together with other people on several RFCs right, so you're not really sitting as a fly on the wall any more, you're being actively participating now, which is why I end up talking to you now which is quite good isn't it.

Larry Garfield 1:15

I'm not sure if the causal relationship is in that direction.

Derick Rethans 1:18

In any case we are talking about enumerations or enums today. What are enumerations or enums?

Larry Garfield 1:26

Enumerations or enums are a feature of a lot of programming languages, what they look like varies a lot depending on the language, but the basic concept is creating a type that has a fixed finite set of possible values. The classic example is Boolean; a Boolean is a type that has two and only two possible values: true and false. Enumerations are way to let you define your own types like that to say this type has two values, sort ascending or descending. This type has four values, for the four different card suits in a standard card deck, or a user can be in one of four states: pending, approved, cancelled, or active. And so those are the four possible values that this variable type can have. And what that looks like varies widely depending on the language. In a language like C or c++, it's just a thin layer on top of integer constants, which means they get compiled away to integers at compile time and they don't actually do all that much, they're a little bit to help for reading. At the other end of the spectrum, you have languages like rust or Swift, where enumerations are a robust Advanced Data Type, and data construct of their own. That also supports algebraic data types, we'll get into that a bit more later. And is a core part of how a lot of the system actually works in practice, and a lot of other languages are somewhere in the middle. Our goal with this RFC, is to give PHP more towards the advanced end of enumerations, because there are perfectly good use cases for it so let's not cheap out on it.

Derick Rethans 3:14

What is the syntax?

Larry Garfield 3:15

Syntax we're proposing is tied into the fact that enumerations as we're implementing them, are a layer on top of objects, they are internally objects with some limitations on them to make them enumeration like. The syntax is if I can do this verbally: enum, curly brace, a list of case foo statements, so case ascending, case descending, case hearts, case spades, case clubs, space diamonds, and so on. And then possibly a list of methods, and then the closed curly brace. So it looks an awful lot like a class because internally, the way that we're implementing them, the enum type itself is a class, it compiles down to a class inside the engine, but instead of being able to instantiate your own instances of it. There are a fixed number of instances that are pre created. And those are the only instances that are ever allowed to exist. And those are assigned to constants on that class, and you have the user status, then you have a user status enum, which has a case approved, or active, and therefore there is in the engine, a class user status which has a constant on it, public constant named approved, whose value is an instance of that object. And then you can use that like a constant, pretty much anywhere and it will just work. The advantage of that over just using an object like we're used to, is you can now type a parameter, or a property or return type for: This is going to return a user status, which means you're guaranteed by the syntax that what you get back from that method is going to be one of those four value objects, each of which are Singleton's so they will always identity compare against each other.

Larry Garfield 5:07

So just like if you type something as Boolean, you know, the only cases you ever have to consider are true and false. If you type something as user status, you know, there are only four possible cases you ever have to think about, and you know exactly what they mean and you can document them, and you don't need to worry about what if someone passes in, you know, a string that says cancelled instead of rejected or something like that. That's syntactically impossible.

Derick Rethans 5:32

What happens if you do that?

Larry Garfield 5:33

You get a syntax error. If for example, a method is typed as having a parameter that takes a user status and you pass in the string rejected. Well, it's a string but what's expected is a user status object. So it's going to type error.

Derick Rethans 5:47

You get a type error yes.

Larry Garfield 5:49

If you try and pass in user status, double colon rejected, when that's not a type, you get an error because there is no constant on that class named rejected therefore, you'll get that error instead. Building on top of objects means an awful lot of functionality, kind of happens automatically. And the error checking you want to kind of happens automatically. And it's in ways that you're already used to working with variables in PHP.

Derick Rethans 6:14

For example, it also means that enums will be able to be auto loaded because they're pretty much a class?

Larry Garfield 6:19

Exactly. They autoload exactly the same way as classes, you don't you didn't even do anything different with them just follow the standard PSR four you're already following, and they'll auto load like everything else.

Derick Rethans 6:28

That also means that class names and enum names, share the same namespace then? You can't have an enum with the same name as a class.

Larry Garfield 6:36

Right. Again, making it built on classes means all of these. So it's kind of like a class, the answer is yes, almost always, and it makes it easier to work with is more convenient, it's like you're used to.

Derick Rethans 6:47

I suppose it's also makes it easier to implement?

Larry Garfield 6:50

Substantially.

Derick Rethans 6:51

I'm familiar with enums in C, or c++ where enums is basically just a way of creating a list of integers, that's auto increase right, then I've been reading the RFC, it doesn't seem to do anything like that. How does this work, are the constants on the enums classes, are they sort of backed by static values as well?

Larry Garfield 7:12

They can be. There's some subtlety here that's one of the last little bits we're still ironing out as we record this, the user status active constant is backed by an object that is an instance of user status and has a hard coded name property on it, called active. That is public but read only. That's how internally in the engine, it keeps track of which one is which. And there is only ever that one instance of user status with the name active on it. There is another instance that has name cancelled or name pending or whatever.

Derick Rethans 7:53

There's no reason to be more than one instance of these at all.

Larry Garfield 7:58

Then you can look at that name property but that's really just for debugging purposes, most of the time you'll just use user status double colon active user colon user status colon approved, whatever as a constant and, roll, roll with that. The values themselves don't have any intrinsic primitive backing, as we've been calling it. You can optionally specify that enumeration values of this type, are going to have an integer equivalent, or a string equivalent. And those have to be unique and you have to specify them explicitly. The syntax for that would be enum user status colon string, curly brace, blah blah blah. And then each case is case active equals.

Derick Rethans 8:46

Is it required that the colon string part is part of your enum declaration?

Larry Garfield 8:51

If you wanted to have a primitive backing, yes, a scalar equivalent. We're actually calling its scalar enums at the moment because that's an easier word to say than primitive. The idea here is enumerations themselves conceptually should not just be an alias for a scalar, they have a meaning conceptually unto themselves rather than just being an alias to an integer. That said, there are plenty of cases where you want to be able to convert enumeration case/enumeration value, into a primitive to store it in a database, to show it on a web page, where I can't store the object in a database that doesn't really work well, MySQL doesn't like that, I added this ability to define a scalar equivalent of a primitive. And then, if you do that, you get two extra pieces that help you. One is a value property, that is again a public read only property, which will be that value, so if approved, or active is a, then user status double colon active arrow value will give you the value A. Which means if you have an enum and you want to save it to a database table, you just grab the value property and that's what you write to the database, and then it's just a string or integer or whatever it is you decide it to make that. When you then load it back up, there is a from method. So you can say user status double colon from. If it's some primitive, and it will return the appropriate object Singleton object for that type so user status double colon active.

Derick Rethans 10:28

You mentioned that enums are quite like classes, does that also mean, you can define methods on the enums?

Larry Garfield 10:35

Enums can have methods on them, both normal objects methods and static methods. They can also take interfaces, and then they have to conform to that interface like any other class. They can also use traits, as long as the traits do not have any properties defined, as long as they are method only traits. We are explicitly disallowing properties, because properties are a way to track and maintain and change state. The whole point of an enumeration is that the enumeration value with the case is a completely defined instance of into itself and user status active is always equal to user status active.

Derick Rethans 11:16

And you can't do that, if there are properties.

Larry Garfield 11:19

Right. You don't want the properties of user status to change dynamically throughout the course of the script that's not a thing.

Derick Rethans 11:25

Yeah, what would you use methods for on enums?

Larry Garfield 11:29

There's a couple of use cases for dynamic methods or object methods, the most common would be a label method. As an example, if you have a user status enumeration, the example we keep going back to. You can have a method for label. Give each of them a make it a scalar enum, so they all have a string or an integer equivalent that you can save for a database, then you give them all a label method. A which in this case would be a single label method that has a match statement inside it and just switches based on the, the enumeration, expect matching enumeration fit together perfectly they're, they're made for each other,

Derick Rethans 12:07

And also match will also tell you if you have forgotten about one of the cases.

Larry Garfield 12:12

Ilija started working on match in PHP 8.0, he knew he wanted to do enumerations, I came in later. This is a long plan that he's been working on I've joined him. You have a label method that matches on the enumeration value, which is dollar this inside the method will refer to the enumeration object you're on, just like any other, and then return a human friendly label for each of those. You can then call a static method called cases, which will return an array of the enumeration cases you have, you'll loop over that and read the value property, and you read the label, and you stick that into a select box or checkboxes or whatever, you know template system you have. There you have a way to attach additional static data to a enumeration without throwing on more stateful properties. That's the most common case, there are others. You can have a enumeration method that returns a specific other enumeration case. Like if you're on enumeration you say what's next, you're creating a series of steps like a junior very basic state machine. And you call next and it'll return, another enumeration object on the same type for whatever the next one is, or whatever else you can think of them. I'm sure there are other use cases I'm not thinking of. For static methods, the main use case there is as an alternate constructor. So if you want to say you have a size enumeration small, medium, large, you have a from length static method, you pass it an integer, and it will map that integer to the small, medium or large enumeration case based on whatever set of rules how you want to define which size is which. Probably the most common case for our static method is that kind of named constructor, essentially, maybe others, cool, come up with some.

Derick Rethans 14:19

During our compensation you mentioned two functions: cases and from, where do these methods come from?

Larry Garfield 14:26

cases is defined on all enumerations period, it just comes with it. You are not allowed to define your own, and it returns. It's a static method, and it returns a list of the instances for that that enum type from the static method that is generated automatically on scalar enum so if there's a scalar backing, then this from method gets automatically generated, you cannot define it yourself it's an error to do so.

Derick Rethans 14:55

Is it an error, even if it isn't a scalar enum, to define the from method?

Larry Garfield 15:01

At the moment I don't believe so you can put a from on a non scalar enum, like unit enum. We recommend against it. Maybe we should block that just for completeness sake, I'll have to talk to Ilia about that. There's a few edge cases like that we're still dealing with. Nikita has a suggestion for around error handling with from, for what happens if you pass an invalid scalar to from, we're not quite sure what the error handling of that is. It may involve having an extra method as well. In that case, details like that we're still working on as we speak, but we're at that level of shaving off the sharp corners, the bulk of the spec is pretty well set at this point.

Derick Rethans 15:45

When I read the RFC it mentioned somewhere that adding enumerations to PHP is part of a larger effort to introduce something called algebraic data types. Can you explain what these are, and, and which future steps and proposals would additionally be needed to support these there?

Larry Garfield 16:05

We're deliberately approaching this as a multi part RFC. The larger goal here is to improve PHP's ability to make invalid states, unrepresentable, which is not a term that I have coined I've read it elsewhere. I think I've read it multiple elsewhere, but its basic idea is you use the type system to make it impossible to describe states that are allowed by your business rules, and therefore, you don't need to write error handling for them. You don't need to write tests for them. Enumerations are a form of that, where you don't need to write a test for what happens when you pass user status joking, to a method, because that's a type error, that's already covered, you don't need to think about it any more.

Derick Rethans 16:53

Would that be a type error or would you get an error for accessing a constant, that doesn't exist on the enum if you say user status joking?

Larry Garfield 17:02

I think you'd actually get an error on a constant that doesn't exist in that case. If you just pass a string, an invalid string to the method and you would get a type error on string doesn't match user status, and the general idea being you structure your code, such that certain error conditions, physically can't happen. Which means you don't need to worry about those error conditions, it's especially useful where you have two things that can occur together but only in certain combinations, my favourite example here, if you're defining an airlock. You have an inner door and an outer door. The case of the indoor and outdoor both being open at the same time is bad, you don't want that. So you define your possible states of the airlock using an enumeration to say: inner door closed out other door closed, is one state is one enumeration case; inner open outer closed as one state, and inner closed outer open is one state. Those are the only three possible values that you can define so you cannot even describe the situation of both doors are open and everyone dies. Therefore, you cannot accidentally end up in that state. That's the idea behind making invalid states unrepresentable. Algebraic data types are kind of an extension of the idea of enumerations further by adding the ability for the cases, to have data associated with them. So they're no longer singleton's. There's a number of different ways that you can implement that different languages do it in different ways. Most of them, bake them into enumerations somehow for one reason or another, a few use the idea of sealed classes, which are a class, which is allowed to be extended by only these three or four or five other classes, and those are the only subclasses you're allowed to have. They're conceptually very similar, we're probably going to go with enumeration based, but there's still debate about that, there's no implementation yet. The idea with ADTs as we're envisioning them is, you can define, like, like you can have an enum case that is just its own thing. You can have an enum case that's backed by a scalar, or you can have an enum case, that has data values associated with it. And then it's not a singleton, can spawn new instances of it. But then those instances are always immutable. The most common example of that that people talk about would be: You can now, implement, monads very easily. I'm not going to go into the whole, what is a monad here.

Derick Rethans 19:36

It's a word that I've heard many times but have no idea what it means.

Larry Garfield 19:40

You should read my book I explained it wonderfully. But it allows you to say, the return value of this method is either nothing, or there is something and that something is this thing. An enumeration, where the two possible types are nothing, and something. And the something has parametrised with the thing that it actually is carrying. That's one use case. Another use case Rust actually in their documentation is a great example of this. If you're building a game, then the various moves that a player can take: turn left, turn right, move forward, move backward, shoot, back up, those are a finite set of possible moves so you make that an enumeration. And then those are the only moves possible in the code. But some of them, like, move forward by how much. By how much is a parametrised value on the move forward enum case. And you can then write your code to say alright, these are the five possible actions a player can take; these three have no extra parameters, this one has an extra parameter. And I don't need to think about anything else because nothing else is can even be defined. So that's the idea behind ADTs, or algebraic data types. There are way to do the kind of things you can do now just in a much more compact and guaranteed fashion, you can do that same thing now with subclasses, but you can't guarantee that no one else is adding another subclass you have to think about. With an ADT you can guarantee that there's no other cases and there will be no other cases.

Larry Garfield 21:17

Another, add on that we're looking at is pattern matching. The idea here is match right now, the match construct in PHP eight, do an identity match. So match variable against triple equals, various possibilities, and that covers a ton of use cases it's wonderful. I love match. However, it would be also very useful to say match this array, let's say an associative array, against a case where the Foo property is bar, I don't care about the rest. If the case where the foo property is beep. And I don't care about the rest I'm gonna do stuff with the rest on the right side of match. With enumerations, that comes in with ADTs, where I want to match against: Is it a move left match against move left and then I want to extract that how far is the right hand side and do something with that. If it's a move right on extract the how far is that on the right. If it's a turn left. Well that doesn't have an associated property so I can match that and do whatever with that. It's a way to deconstruct some value, partially, and then take an action based on just part of that object. That's a completely separate RFC, that, again, we think would make enumerations, even more powerful, and make it a lesson number of other things more powerful of make the match statement, a lot more robust, all the pieces of that we're looking at is looking at match against type, so you can do an instance of or an isint or whatever check as part of the match. Again this is all still future RFC nothing has actually been coded for this yet, it's in the, we would like to do this next if someone lets us

Derick Rethans 23:12

Coming back to this RFC. What has the feedback been to it so far?

Larry Garfield 23:17

Very positive. I don't think anyone has pushed back on the idea. Or earlier draft used a class per case, rather than object per case, which a couple of people push back on it being too complicated so we simplified that to the object per case. And at this point, I think we're just fighting over the little details like: Does that from method return null, or does it throw? What does it throw? Do we have a second method that does the opposite? Exactly what does reflection look like? We have an isenum function or do we have an enum exists function? We're at that level. I expect this RFC is probably going to pass with like 95%. At this point, something like that. It's not a controversial concept, it's just making sure all the little details are sorted out, the T's are crossed, and the i's are dotted and so on.

Derick Rethans 24:09

Did we miss anything in the discussion about enums, do you have anything to add?

Larry Garfield 24:15

I don't think so, at least as far as functionality is concerned. To two other things process wise: one, I encourage people to look at this multi stage approach for PHP. I know I've talked about this on, on this podcast before; having a larger plan that you can work on in pieces and then multiple features that fit together nicely to be more than the sum of their parts. It's a very good thing, we should be doing more of that, and collaborating on RFC so that small targeted functionality adds up to even more functionality when you combine them, which is what we're looking to do with enums, and ADTs, and pattern matching for example. Credit where it's due, Ilija has done all of the code for this patch. I'm doing design and project management on this, but actually making it work is 100% Ilija he deserves all the credit for that, not me.

Derick Rethans 25:06

Thank you, Larry for explaining enums to me today. I learned quite a lot.

Larry Garfield 25:11

Thank you. See you on a future episode, hopefully.

Derick Rethans 25:16

Thank you for listening to this instalment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool, You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next time.


PHP Internals News: Episode 72: PHP 8.0 Celebrations!

PHP Internals News: Episode 72: PHP 8.0 Celebrations!

In this episode of "PHP Internals News" we're looking back at all the RFCs that we discussed on this podcast for PHP 8.0. In their own words, the RFC authors explain what these features are, with your host interjecting his own comments on the state of affairs.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:23

Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language.

Derick Rethans 0:32

This is Episode 72. PHP eight is going to be released today, November 26. In this episode, we look back across the season to find out which new features are in PHP eight dot zero. If I have spoken with the instigator of each of these features, I'm letting them explain what this new feature is. in the first episode of this current year, I spoke with Nikita Popov about weak maps, a feature that builds on top of the weak references that were introduced in PHP seven four. I asked: What's wrong with the weak references and why do we now need to weak maps.

Nikita Popov 1:10

There's nothing wrong with references. This is a reminder, what weak references are about, they allow you to reference, an object, without preventing it from being garbage collected. So if the object is unset, then you're just left with a dangling reference, and if you try to access it you will get acknowledged sort of the object. Now the probably most common use case for any kind of weak data structure is a map or an associative array, where you have objects and want to associate some kind of data with some typical use cases are caches or other memoize data structures. And the reason why it's important for this to be weak, is that you do not. Well, if you want to cache some data with the object and then nobody else is using that object, you don't really want to keep around that cache data, because no one is ever going to use it again, and it's just going to take up memory usage. And this is what the weak map does. So you use objects as keys, use some kind of data as the value. And if the object is no longer used outside this map, then is also removed from the map as well.

Derick Rethans 2:29

A main use case for weak maps will likely be ORMs and related tools. In the next, episode 38, I discussed this trainable interface with Nicolas Grekas from Symfony fame. I asked: Nicolas, could you explain what stringable is.

Nicolas Grekas 2:45

Hello, and stringable is an interface that people could use to declare that they implement some the magic to string method.

Derick Rethans 2:53

That was a short and sweet answer, but a reason for wanting to introduce this new interface was much more complicated, and are also potential issues with breaking backwards compatibility. Nikolas replied to my questioning about that with:

Nicolas Grekas 3:06

That's another goal of the RFC; the way I've designed it is that I think the actual current code should be able to express the type right now using annotations, of course. So, what I mean is that the interface, the proposal, the stringable is very easily polyfilled, so we just create this interface in the global namespace that declare the method and done, so we can do that now, we can improve the typing's now. And then in the future, we'll be able to turn that into an actual union type.

Derick Rethans 3:39

I had a chat with Nikita Popov about union types as part of the last season in Episode 33 before PHP seven four was released, but after the feature freeze for it happened. I happen to speak with Nikita quite a lot because he does so much work improving PHP. In episodes 40 and 43, we discussed a bunch of smaller features and tweaks to the language. First up was the static return type, Nikita explains:

Nikita Popov 4:05

So PHP has a three magic special class names that's self for into the current class parent, refering to the parent class, and static, which is the late static binding class name. And that's very similar to self. If no inheritance is involved than static is the same as self, introducing refers to the current class. However, if the method is inherited. And you call this method on the child class. Then, self is still going to refer to the original class, the parent. Well static is going to refer to the class on which the method was actually called.

Derick Rethans 4:50

Next, most the class name literal on objects, which adds the following:

Nikita Popov 4:55

And that just returns to the fully qualified class name, for example, have a use statement for that class, you can get back the full name instead of the short name. I think we've had this since PHP five five. That's a great feature because it's like makes it clear whether you're referencing the class and not just some random string, and that means, for example, that the IDE refactorings can work better and so on.

Derick Rethans 5:20

In PHP seven, we touched up inconsistencies in PHP is compound variable syntax, but we missed a few cases, Nikita explains:

Nikita Popov 5:28

All of these remaining consistencies are like really really minor things and edge cases. But weirdly, all or at least most of them are something that someone that at some point ran into and either open the bug or wrote me an email, or on Twitter. So people somehow managed to still run into these things.

Derick Rethans 5:56

But syntax changes are not the only thing that needs fixing; sometimes we also get the theory slightly wrong. In this case related to inconsistencies with traits, Nikita explains again:

Nikita Popov 6:07

The problem is that traits are sometimes not self contained. So to give a specific example we have in the logger PSR. We have a trait called logger treat, which has a bunch of methods like: warning, error, info, notice, and so on. So we just simple helper methods which all call the log method with a specific log level, and this trait only specified these helper methods, but still requires the actual class to implement the log method that we usually indicate that is by adding an abstract method to the trait. You have all the methods you actually want to provide by the trait and to have a number of abstract methods that the trait itself requires to work. This already works fine. The problem is just that these methods are not actually validated, or they are all inconsistently validated. Even though the trait specifies this abstract method, you could implement it in the class with a completely different signature.

Derick Rethans 7:09

Probably one of the big new features and PHP 8.0 are attributes, certainly the most discussed new feature with various RFCs to introduce a feature, and then change the syntax a few times. It all started with Benjamin Eberlei introducing a feature in April, ran Episode 47 I asked Benjamin, what attributes are. He replied:

Benjamin Eberlei 7:30

They are a way to declare structured metadata on declarations of the language. So in PHP, or in my RFC, this would be classes, class properties, class constants, and regular functions. You could declare additional metadata there that sort of tags, those declarations with specific additional machine readable information.

Derick Rethans 7:56

At the moment, many tools are already used up block comments to do a similar thing. So I asked how attributes are different. Benjamin answered with his first proposed syntax:

Benjamin Eberlei 8:06

The idea is that we introduce a new syntax that is independent of the docblock comments. Essentially, before each declaration, you can use the lesser-than symbol twice, then the attribute declaration, and then the greater-than sign twice. This is the syntax, I've used from the previous attributes RFC. Dmitri at that point, use the syntax from Hack. And it makes sense to reuse this not because Hack and PHP are going in the same direction anymore, but because Hack at that point they introduced that they had the same problems with which symbols are actually still easy to use. And we do have a problem in PHP a little bit with that kind of sort of free symbols. We can still use at certain places, and lesser-than and greater-than at this point are easy to parse. There are a bunch of alternatives, and one thing that I would probably proposes an alternative syntax, where we start with a percentage sign, then a square bracket open, and then a square bracket close. It is more in line with our Rust declares attributes, by Rust uses the sort of the hash symbol which we can't use because it's a comment in PHP.

Derick Rethans 9:23

He already hinted at alternatives for syntax and we'll get back to that in a moment. However, the main important thing is what was inside the surrounding attribute notation, and Benjamin explains again:

Benjamin Eberlei 9:35

If you declare an attribute name, and then you sort of have a parenthesis, open parenthesis close to pass optional arguments. You don't have to use them so you can only use the attribute name. If you sort of want to tag something just, this is a validator, this is an event listener, or whatever you come up with to use attributes for. But if you need to configure something in addition, then you can use the syntax sort of looks like if you would construct a new class, except that you don't have to put the new keyword in front of it.

Derick Rethans 10:10

In the rest of that episode we spoke about how to use attributes and what the general ideas behind them were. Soon after the attributes RFC was accepted, Benjamin proposed a second related RFC to tweak some of the working that came up throughout the discussion phase. At the time of the original RFC he did not want to make any changes in order to keep the discussion focused, but there were some tweaks necessary. In Episode 64 Benjamin explains what these changes were:

Benjamin Eberlei 10:38

There was renaming the attribute class. So, the class that is used to mark an attribute from PHPAttribute to just Attribute. I guess we go into detail in a few seconds but I just list them. The second one is an alternative syntax to group attributes and yeah save a little bit on the characters to type and allow to group them. The third was a way to validate which declarations an attribute is allowed to be set on, and the fourth was a way to configure if an attribute is allowed to be declared once or multiple times, on one declaration.

Derick Rethans 11:23

All the suggested tweaks passed with ease. A contentious issue, however, was the syntax to enclose the attributes. Two RFCs later, the PHP development team finally settled on the syntax which has attributes enclosed in hash, square bracket open, and close with the square bracket.

Derick Rethans 11:44

George Peter Banyard likes tidying up things in the language. I spoke with him on several occasions this season, where he was suggesting to just do that. In the first instance he's suggesting to change PHP to use locale independent floating point numbers to string conversions. He explains the problem:

George Peter Banyard 12:02

Currently, when you do a float to string conversion. So or casting or displaying a float, the conversion will depend on like the current locale. So instead of always using like the decimal dot separator. For example, if you have like a German or the French locale enabled, it will use like a comma to separate like the decimals.

Derick Rethans 12:23

He explained what he suggested to change:

George Peter Banyard 12:26

Change more or less to always make the conversion from float to string, the same so locale independent, so it always uses the dot decimal separator. With te exception of printf was like the F modifier, because that one is, as previously said locale aware, and it's explicitly said so.

Derick Rethans 12:46

The second RFC was candidly titled saner numeric strings. I asked George, what the scope of the problem he wanted to address is.

George Peter Banyard 12:55

PHP has the concept of numeric strings, which are strings which have like integers or floats encoded as a string. Mostly that would arrive when you have like a GET request or a POST request and you take like the value of the form, which would be in a string. And the issue is that PHP makes some kind of weird distinctions, and classifies numeric strings in three different categories mainly. So there are purely numeric strings, which are pure integers or pure float, which can have an optional leading whitespace and no trailing whitespace. However trailing white spaces are not part of the numeric string specification in the PHP language. To deal with that PHP has a concept of leading numeric strings, which are strings which are numeric but ...

Derick Rethans 13:42

As you can hear the way how PHP handles these numbers in strings is extremely complicated. The fix is just as complicated, but it pretty much boils down to stop treating strings like "5elephant" as a number. In the last episode, I briefly discuss Larry Garfield's object ergonomics article where he sets out a more coherent way forwards into thinking on how to solve some more of the bigger pain points of PHP. Most related to value objects. Although he did not end up proposing any RFCs himself, Nikita Popov did take some inspiration from it. And he proposed two features for inclusion into PHP eight: constructor property promotion, and named arguments. In Episode 53 Nikita explains with constructor property promotion intends to solve.

Nikita Popov 14:29

Right now, if we take a simple example from the RFC, we have a class Point, which has three properties, x y and z. And each of those has a float type. And that's really all the class is ideally, this is all we would have to write. But of course, to make this object actually usable we also have to provide a constructor. And the constructor is going to repeat that, yes, we want to accept three floating point numbers, x y and z as parameters. And then in the body we have to again repeat that. Okay, each of those parameters needs to be assigned to a property. So we have to write this x equals x, this y equals y, this z equals z. I think for the Point class. This is still not a particularly large burden. Because we have like only three properties. The names are nice and short, the types are really short, and we don't have to write a lot of code. But if you have larger classes with more properties, with more constructor arguments, with larger and more descriptive names, and also larger and more descriptive type names. And this makes up for quite a bit of boilerplate code.

Derick Rethans 15:52

I asked: What is the syntax that you're proposing to improve this?

Nikita Popov 15:57

The syntax is to merge the constructor and the property declarations, so you declare the constructor, and you add an extra visibility keyword in front of the normal parameter name. So instead of accepting float x, in the constructor. You accept public float x. And what this shorthand syntax does is to also generate the corresponding property. So you're declaring a property, public float x, and to also implicitly perform this assignment in the constructor body so to assign this x equals x. This is really all it does so it's just syntactic sugar. It's a simple syntactic transformation that we're doing, but that reduces the amount of boilerplate code you have to write for value objects in particular, because for those commonly, you don't really need much more than the properties and the constructor.

Derick Rethans 16:58

Tying in the constructor property promotion was a slightly more controversial RFC, named arguments, which I discussed with Nikita in Episode 59. I asked him what named arguments are:

Nikita Popov 17:09

Currently if you're calling a function or a method you have to pass the arguments in a certain order. So in the same order in which they were declared in the function or method declaration. And what named arguments are, and parameters allow us to do, is to instead specify the argument names, when doing the call. Just taking the first example from the RFC, we have the array_fill function, and the array_fill function accepts three arguments. So you can call like array_fill(0, 100, 50). Now, like what what does that actually mean. This function signature is not really great because you can't really tell what the meaning of this parameter is and, in which order you should be passing them. So with named parameters, the same code would be something like array_fill( start: 0, number: 100, value: 50). And that should immediately make this call, much more understandable, because you know what the arguments mean. And this is really one of the main like motivations or benefits of having named parameters.

Derick Rethans 18:21

We also briefly touched on the main issues where the introduction of named arguments could introduce backward compatibility issues.

Nikita Popov 18:28

If you don't use named arguments that nothing is going to break. But of course, if named arguments are used with codes that did not expect them, then we can run into some issues. So that's one of the issues. And the other one is more of a like long term maintenance concern, that if we introduce named parameters, then those parameters become significant to the API. Which means you cannot rename parameter names in minor versions of libraries if you're semver compatible. Of course, you might be breaking some codes, using those parameter names. And I think one of the biggest concerns that has come up in the discussion is that this is a significant increase in the API burden for open source libraries.

Derick Rethans 19:15

Beyond the main features that we've discussed so far. PHP eight also outs a few smaller ones. For example, the non capturing catches, which I discussed with Max Semenik in Episode 58. He explains his short proposal:

Max Semenik 19:29

In current PHP, you have to specify a variable for exceptions you catch, even if you don't need to use this variable in your code. And I'm proposing to change it to allow people to just specify an exception type.

Derick Rethans 19:48

This proposal password 48 votes for, and one against. The last few major PHP releases a lot of focus was put into strengthening PHP's type system. You see that and PHP seven four with additions to OO variance rules, and in PHP eight already with union types, the stringable interface and a static return type. Dan Ackroyd explains in episode 56, why he was suggesting to ask the predefined union type "mixed".

Dan Ackroyd 20:14

I have a library for validating parameters, and due to how that library needs to work the code passes user data around a lot. Internally, and then back out to whether libraries return the validator's result. So I was upgrading that library to PHP 7.4, and that version introduced property types, which are very useful things. What I was finding was that I was going through the code, trying to add types everywhere occurred. And as a significant number of places where I just couldn't add a type, because my code was holding user data. It could be any other type. The mixed type had been discussed before, an idea that people kind of had been kicking around but it just never been really worked on. So that was the motivation for me, I was having this problem where I couldn't upgrade my library, as I wanted to, I kept forgetting: has this bit of code here, been upgraded and I just can't add a type, or is it the case that I haven't touched this bit of code yet.

Derick Rethans 21:16

When I spoke with Dan, he also mentioned that sometimes he assists with writing RFCs in case some person would benefit from some technical editing, for example, due to language barriers. In the same way I ended up speaking to Dan again in Episode 65 about a null safe operator, on which he was working with Ilija Tovilo. That explains what a feature is about.

Dan Ackroyd 21:38

Imagine you've got a variable that's either going to be an object, or it could be null, so the variable is an object, you're going to want to call a method on it, which obviously if it's null, then you can't call method on it because it gives an error. Instead, what the null safe approach allows you to do is to handle those two different cases in a single line, rather than having to wrap everything with if statements to handle the possibility that it's null. The way it does this is through a thing called short circuiting, so instead of evaluating whole expression. As soon as use the null safe operator, I want the left hand side of the operator is null, and then get short circuited and it all just evalutates to null instead.

Derick Rethans 22:18

It also gets an additional benefit related to having shorter code.

Dan Ackroyd 22:24

And having the information about what the code's doing in the code, rather than in people's heads makes a lot easier for compilers and static analyzers to their jobs.

Derick Rethans 22:35

The last big nice syntax feature in PHP eight zero is the match expression, again by Ilija Tovilo. Instead of me interviewing Dan again, I decided as a joke to interview myself on the subject. It was a little bit surreal but I think it worked out well enough, as a one off event. I first explained to myself the problem with the existing switch language construct.

Derick Rethans 22:56

So, before we talk about the match expression, we really need to talk about switch. Switch is a language construct in PHP that you probably know allows you to jump to different cases depending on the value. So you have to switch statement: switch, parentheses opening, variable name, parenthesis closes, and then for each of the things that you want to match against your use case condition, and that condition can be either static value or an expression. But switch has a bunch of different issues that are not always great. So the first thing is that it matches with the equals operator, or the equals, equals sign. And this operator as you probably know, will ignore types, causing interesting issue sometimes when you're doing matching with variables that contain strings with cases that contains numbers, or a combination of numbers and strings. So, if you do switch on the string foo, and one of the cases has case zero, then it will still be matched because it could type juggle the foo to zero, and that is of course not particularly useful. At the end of every case statement you need to use break, otherwise it falls down to the case that follows. Now sometimes that is something that you want to do, but in many other cases that is something that you don't want to do and you need to always use break. If you forget, then some weird things will happen sometimes. Anothercommon thing to use it switches that we sit on on a variable. And then, what you really want to do is the result of, depending on which case's being matched assign a value to a variable. And the current way how any student now is case, say case zero, $result equals string one, break, and you have case two where you don't set: return value equals string two and so on and so on. Which isn't always a very nice way of doing it because you keep repeating the assignment, all the time. And another but minor issue with switch is that it is okay not to cover every value with a condition. So, it's totally okay to have case statements, and then not have a condition for a specific type and switch doesn't require you to add default at the end either, so you can actually end up having a condition that would never match any case, and you have no idea that that would happen.

Derick Rethans 25:16

Before I went into details, I also explained how the new match language construct could solve some of these criticisms.

Derick Rethans 25:24

The match expression is a new language keyword, which also allows you to switch depending on a condition matching a variable. You're saying matching this variable against a set of expressions just like you would do with switch. But there's a few major differences with switch here. Unlike switch, match returns a value, meaning that you can do return value equals match, then your variable that you're matching on, and the value that gets assigned to this variable is the result of the expression on the right hand side of each condition.

Derick Rethans 26:04

That's it for the new features in PHP eight, but I haven't spoken yet about a reason why the PHP team is releasing PHP eight and not PHP seven dot five. And of course, that is PHP's new JIT engine that is slated to improve performance, quite a lot. I have some concerns of my own. And in Episode 48 I spoke with Sara Goleman, which articulated my main concerns with it more eloquently.

Sara Golemon 26:31

If you go and look at the engine, particularly the runtime pieces of the engine, although the compiler's complex as well. You have to do a lot of digging before you even get to a point that you can see how the pieces maybe start to fit together. You and I have spent enough time in the engine code that we know where to look for a particular thing like let's say that opcode you mentioned that implements strlen. We know that Zend VM def dot h has got the definition for that. We also know that that file is not real code, it's a pre processed version of code that gets built later on. Somebody coming to that blind is not going to see a lot of those pieces. So there's already this big ramp up just to get into the Zend engine, as it exists now in 7.4. Let's add JIT on top of that, you've got code that is doing call forward graphs and single static analysis and finding these tracelets, and making sense of the code at a higher level than a single instruction at a time, and then distilling that down to instructions that the CPU is going to recognize, and CPU instructions are these packed complex things that deal with immediates and indirects, and indirects of indirects, and registers, and the x86 call API is a ridiculous thing that nobody should ever have to look at. So you add all this complexity to it, that by the way, sits in ext/opcache, it's all isolated to this one extension, that reaches into the engine and fiddles around with things to make all this JIT magic happen and we're going to take your reduced set of developers who know how to work on Zend engine and you're going to reduce that further. I think at the moment it's still only about three or four people who actually understand how PHP's JIT is put together enough that they can do any effective work on it.

Derick Rethans 28:20

I am still a little apprehensive about whether the effort of introducing a JIT engine is going to pay off. I certainly hope that I'm going to be proven wrong and that the JIT engine is going to be a massive performance boost. But in the end, we do definitely need more people to understand and work on the PHP engine and a new JOT engine that is built into opcache. Perhaps that's something you yourself might want to have a look at in 2021. PHP 8 will be out later today and I hope that you're pleased with all the new features that a PHP development worked on hard throughout the year. With this I'm concluding this episode and also this year's season. I will be back in the new year with more episodes where I hope to demystify the development of the PHP engine some more. Enjoy the holidays and stay safe.

Derick Rethans 29:08

Thank you for listening to this installment of PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next year.


PHP Internals News: Episode 71: What didn’t make it into PHP 8.0?

PHP Internals News: Episode 71: What didn’t make it into PHP 8.0?

In this episode of "PHP Internals News" we're looking back at all the RFCs that we discussed on this podcast for PHP 7.4, but did not end up making the cut. In their own words, the RFC authors explain what these features are, with your host interjecting his own comments on the state of affairs.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:15

Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 71. At the end of last year, I collected snippets from episodes about all the features that did not make it into PHP seven dot four, and I'm doing the same this time around. So welcome to this year's 'Which things were proposed to be included into PHP 8.0, but didn't make it. In Episode 41, I spoke with Stephen Wade about his two array RFC, a feature you wanted to add to PHP to scratch an itch. In his own words:

Steven Wade 0:52

This is a feature that I've, I've kind of wish I would have been in the language for years, and talking with a few people who encouraged. It's kind of like the rule of starting a user group right, if there's not one and you have the desire, then you're the person to do it. A few people encouraged to say well why don't you go out and write it? So I've spent the last two years kind of trying to work up the courage or research it enough or make sure I write the RFC the proper way. And then also actually have the time to commit to writing it, and following up with any of the discussions as well.

Steven Wade 1:20

I want to introduce a new magic method the as he said the name of the RFC is the double underscore to array. And so the idea is that you can cast an object, if your class implements this method, just like it would toString; if you cast it manually, to array then that method will be called if it's implemented, or as, as I said in the RFC, array functions will can can automatically cast that if you're not using strict types.

Derick Rethans 1:44

I questioned him on potential negative feedback about the RFC, because it suggested to add a new metric method. He answered:

Steven Wade 1:53

Beauty of PHP is in its simplicity. And so, adding more and more interfaces, kind of expands class declarations enforcement's, and in my opinion can lead to a lot of clutter. So I think PHP is already very magical, and the precedent has been set to add more magic to it with seven four with the introduction of serialize and unserialize magic methods. And so for me it's just kind of a, it's a tool. I don't think that it's necessarily a bad thing or a good thing it's just another option for the developer to use

Derick Rethans 2:21

The RFC was not voted on and a feature henceforth did not make it into PHP eight zero.

Derick Rethans 2:27

Operator overloading is a topic that has come up several times over the last 20 years that PHP has been around as even an extension that implements is in the PECL repository. Jan Bøhmer proposed to include user space based operator overloading for PHP eight dot zero. I asked him about a specific use cases:

Jan Böhmer 2:46

Higher mathematical objects like complex numbers vectors, something like tensors, maybe something like the string component of Symfony, you can simply concatenate this string object with a normal string using the concat operator and doesn't have to use a function to cause this. Most basically this should behave, similar to a basic string variable or not, like, something completely different.

Derick Rethans 3:16

For some issues raised during the RFC process and Jan explains to the most notable criticisms.

Jan Böhmer 3:21

First of all, there are some principles of operator overloading in general. So there's also criticism that it could be used for doing some very weird things with operator overloading. There was mentioned C++ where the shift left shift operator is used for outputting a string to the console. Or you could do whatever you want inside this handler so if somebody would want to save files, or modify a file in inside an operator overloading wouldn't be possible. It's, in most cases, function will be more clear what it does.

Derick Rethans 4:01

He also explained his main use case:

Jan Böhmer 4:04

Operator overloading should, in my opinion, only be used for things that are related to math, or creating custom types that behave similar to build types.

Derick Rethans 4:15

In the end, the operator overloading RFC was voted on. But ultimately declined, although there was a slim majority for it.

Derick Rethans 4:24

In Episode 44, I spoke with Máté Kocsis about the right round properties RFC and asked him what the concept behind them was. He explained:

Máté Kocsis 4:33

Write once properties can only be initialized, but not modified afterwards. So you can either define a default value for them, or assign them a value, but you can't modify them later, so any other attempts to modify, unset, increment, or decrement them would cause an exception to be thrown. Basically this RFC would bring Java's final properties, or C#'s read only properties to PHP. However, contrary to how these languages work, this RFC would allow lazy initialization, it means that these properties don't necessarily have to be initialized until the object construction ends, so you can do that later in the object's lifecycle.

Derick Rethans 5:22

Write once properties was not the only concept that he had explored before writing this RFC. We discussed these in the same episode:

Máté Kocsis 5:31

The first one was to follow Java and C# and require all right, once properties to be initialized until the object construction ends, and this is what we talked about before. The counter arguments were that it's not easy to implement in PHP, the approach is unnecessarily strict. The other possibility is to let unlimited writes to these properties, until object construction ends and then do not allow any writes, but positive effect of this solution is that it plays well with bigger class hierarchies, where possibly multiple constructors are involved, but it still has the same problems as the previous approach. And finally the property accessors could be an alternative to write once properties. Although, in my opinion, these two features are not really related to each other, but some say that property accessors could alone, prevent some unintended changes from the outside, and they say that maybe it might be enough. I don't share this sentiment. So, in my opinion, unintended changes can come from the inside, so from the private or protected scope, and it's really easy to circumvent visibility rules in PHP. There are quite some possibilities. That's why it's a good way to protect our invariance.

Derick Rethans 7:02

In the end this RFC was the client, as it did not wait to two thirds majority required with an even split between the proponents and the opponents.

Derick Rethans 7:11

Following on from Máté's proposal to add functionality to our object orientation syntax. I spoken Episode 49 with Jakob Givoni on a suggested addition COPA, or in full: contact object property assignments Jakob explains why he was suggesting to add this.

Jakob Givoni 7:28

As always possible for a long time why PHP didn't have object literals, and I looked into it, and I saw that it was not for lack of trying. Eventually I decided to give it a go with a different approach. The basic problem is simply to be able to construct, populate, and send an object in one single expression in a block, also called inline. It can be like an alternative to an associative array: you give the data, a well defined structure, the signature of the data is all documented in the class.

Derick Rethans 8:01

Of course people abuse associative arrays for these things at the moment, right. Why are you particularly interested in addressing this deficiency as you see it?

Jakob Givoni 8:11

Well I think it's a common task. It's something I've been missing as I said inline objects, obviously literals for a long time and I think it's a lot of people have been looking for something like this. And also it seemed like it was an opportunity that seemed to be an fairly simple grasp.

Derick Rethans 8:28

I also asked them what the main use case for this was.

Jakob Givoni 8:32

Briefly, as I mentioned, they're data transfer objects, value objects, those simple associative arrays that are sometimes used as argument backs to constructors when you create objects. Some people have given some examples where they would like to use this to dispatch events or commands to some different handlers. And whenever you want to create, populate, and and use the object in one go, COPA should help you.

Derick Rethans 9:04

COPA did also not make it into PHP eight with the RFC being the client nearly unanimously. The proposals by both Máté and Jakob where meant to improve PHP object syntax by helping out with common tasks. The implementation ideas of what they were trying to accomplish were not particularly lined up. This spurred on Larry Garfield to write a blog post titled: object ergonomics, which are discussed with him in Episode 51. I first asked him why he wrote this article:

Larry Garfield 9:33

As you said, there's been a lot of discussion around improving PHP's general user experience of working with objects in PHP, where there's definitely room for improvement, no question. And I found a lot of these to be useful in their own right, but also very narrow, and narrow in ways that solve the immediate problem, but could get in the way of solving larger problems later on down the line. I went into this with an attitude of: Okay, we can kind of piecemeal attack certain parts of the problem space, or we can take a step back and look at the big picture and say: All right, here's all of the pain points we have, what can we do that would solve, not just this one pain point, but let us solve multiple pain points with a single change, or these two changes together solve this other pain point as well, or, you know, how can we do this in a way that is not going to interfere with later development that we talked about. We know we want to do, but hasn't been done yet. Are we not paint ourselves into a corner by thinking too narrow.

Derick Rethans 10:40

The article mentions many different categories and possible solutions. I can't really sum these up in this episode because it would be too long. Although, Larry did not end up proposing RFC based on this article, it can be called responsible for constructor property promotions, which I discussed with Nikita Popov in Episode 53 and Named Arguments which are discussed with Nikita in Episode 59. Both of these made it into PHP 8.zero and cover some of the same functionality that Jakob's COPA RFC covered. I will touch on the new features that did make it into PHP 8.0 in next week's episode. There are two more episodes where discuss features that did not make it into PHP eight zero, but these are still under discussion and hence might make it into next year's PHP eight dot one. In Episode 57, I spoke with Ralph Schindler about his conditional code flow statements RFC. After the introduction, I asked what he specifically was wanting to introduce.

Ralph Schindler 11:36

This is, you know, it's, it's very closely related to what in computer science is called a guard clause. And I used that phrase lightly when I originally brought it up on the mailing list but it's very close in line to that it's not necessarily exactly that, in terms of the syntax. In terms of like when you speak about it in the PHP code sense, it really is sort of a change in the statement. So putting the return before the if, that's really what it is. So a guard clause, it's important to know what that is is it's a way to interrupt the flow of control

Derick Rethans 12:08

Syntax proposals are fairly controversial, and I asked Ralph about his opinions of the type of feedback that he received.

Ralph Schindler 12:15

The smallest changes always get the most feedback, because there's such a wide audience for a change like this.

Derick Rethans 12:23

The last feature that did not make it into PHP eight zero was property write/set visibility, which I discussed with André Rømcke in Episode 63. I asked him what his RFC was all about:

Derick Rethans 12:34

What is the main problem that you're wanting to solve with what this RFC proposes?

André Rømcke 12:40

The high level use case is in order to let people, somehow, define that their property should not be writable. This is many benefits in, when you go API's in order to say that yeah this property should be readable. But I don't want anyone else but myself to write it. And then you have different forms of this, you have either the immutable case where you, ideally would like to only specify that it's only written to in constructor, maybe unset in destructor, maybe dealt with in clone and so on, but besides that, it's not writable. I'm not going into that yet, but I'm kind of, I was at least trying to lay the foundation for it by allowing the visibility or the access rights to be asynchoronus, which I think is a building block from moving forward with immutability, read only, and potentially also accessors but even, but that's a special case.

Derick Rethans 13:39

At the time of our discussion he already realized that it would be likely postponed to PHP eight dot one as it was close to feature freeze, and the RFC wasn't fully thought out yet. I suspect we'll hear more about it in 2021. With this I would like to conclude this whirlwind tour of things that were proposed but did not make it in. Next week I'll be back with all the stuff that was added to PHP for the PHP eight zero celebrations. Stay tuned.

Derick Rethans 14:09

Thanks for listening to this installment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week.