PHP Internals News: Episode 89: Partial Function Applications

PHP Internals News: Episode 89: Partial Function Applications

In this episode of "PHP Internals News" I chat with Larry Garfield (Twitter) and Joe Watkins (Twitter, GitHub, Blog about the "Partial Function Applications" RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:14

Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is Episode 89. Today I'm talking with Larry Garfield and Joe Watkins about a partial function application RFC that they're proposing with Paul Crevela and Levi Morrison. Larry, would you please introduce yourself?

Larry Garfield 0:36

Hello World. I'm Larry Garfield or Crell on most social medias. I'm a staff engineer for Typo3 the CMS. And I've been getting more involved in internals these days, mostly as a general nudge and project manager.

Derick Rethans 0:52

And hello, Joe, would you please introduce yourself as well?

Joe Watkins 0:55

Hi, I'm Joe, or Krakjoe, I do various PHP stuff. That's all there is to say about that really.

Derick Rethans 1:02

I think you do quite a bit more than just a little bit. In any case, I think for this RFC, you, you wrote the implementation of it, whereas Larry, as he said, did some of the project management, I'm sure there's more to it than I've just paraphrased in a single sentence. But can one of you explain in one sentence, or if you must, maybe two or three, what partial function applications, or I hope for short, partials are?

Larry Garfield 1:27

Partial function application, in the broadest sense, is taking a function that has some number of parameters, and making a new function that pre fills some of those parameters. So if you have a function that takes four parameters, or four arguments, you can produce a new function that takes two arguments. And those other two you've already provided a value for in advance.

Derick Rethans 1:54

Okay, I feel we'll get into the details in a moment. But what are its main benefits of doing this? What would you use this for?

Larry Garfield 2:01

Oh, there's a couple of places that you can use partial application. It is what got me interested. It's very common in functional programming. But it's also really helpful when you want to, you have a function that like, let's say, string replace takes three arguments, two of which are instructions for what to replace, and one of which is the thing in which you want to replace. If you want to reuse that a bunch of times, you could build an object and pass in constructor values and save those and then call a function. Or you can just partially apply string replace with the things to search for, and the things to replace with and get back a function that takes one argument and will do that replacement on it. And you can then reuse that over and over again. There are a lot of cases like that, usually use in combination with functions that wants a callback. And that callback takes one argument. So array map or array filter are cases where very often you want to give it a function that takes one argument, you have a function that takes three arguments, you want to fill in those first ones first, and then pass the result that only takes one argument to array map or a filter, or whatever. So that's the one of the common use cases for it.

Derick Rethans 3:15

That's the benefits and some of its background comes from functional programming, as you've just mentioned. What is the syntax that you're proposing and some of the semantics?

Larry Garfield 3:26

The syntax that we've developed, are two placeholders that you can use in a function call. So if you're calling a function as you normally would, but for one of the arguments, you pass a question mark, or at the tail end, you have an ellipsis (dot dot dot), then that tells the engine: This is not a function call. This is a partial application. And what it will do is return not the result of the function but return a closure object that has the the arguments that correspond to those question marks. And then when called with those arguments, we'll pass those along with the original function. Probably easier to explain, if I use a concrete example, using the string replace example we talked about before, you would call it with str_replace, the example from the RFC, hello, hi, question mark. What that gives you is a callable, a closure that has one argument, which will take its type and name from str_replace. So the third argument to str_replace essentially gets copied into that closure. And what closure does internally when you call it with that one argument is it just calls string replace with hello, hi, and whatever argument you gave it and returns that value. It is conceptually very, very similar to just writing a short lambda or an arrow function that takes one arguments and calls string replace hello, hi, and that argument. In most cases, it ends up functioning almost exactly like that. There's a few subtle differences in a few places. But most of the time, you can think of it working essentially like that. The question mark means one required argument only. The dot dot dot means zero or more arguments, if you want to, say provide the first argument to a function, and then dot dot dot would mean: And then all of the other arguments, however many there are, even if it's that zero, those are what's left, which languages other languages that have partial application as a first class feature, usually end up doing it that way where you can only pre fill from the left. PHP, because the placeholder lets us do it in any order. So we can skip over arguments if we want to, which is quite nice. But it means that you can take a function and reduce it to, I want to prefill just these two arguments and leave these three arguments for the new function, or I want to prefill these arguments from the left, and then everything else, whatever it is, is left. It also lets you do cute things like if you provide all of the arguments to a function, and then just tack on a dot dot dot the end of it, then you get back a closure that takes essentially zero arguments. But when called, will call that other function. So it's lets lets you really easily build a delayed function as you need to.

Derick Rethans 6:15

When do the arguments to the function get evaluated then?

Larry Garfield 6:18

Arguments are evaluated in advance. So this is the subtle difference between partial application and the short lambda syntax. In a short lambda, what happens is, essentially, that entire expression on the right hand side gets wrapped up into a closure. And so any arguments that are compound like they have a function call that is inside one of the placeholders, or one of the arguments, that'll get evaluated later. With partial application, the function that is in a parameter position gets evaluated first and reduced to a value. And that value gets partially applied to the function. 90% of the time, that's not going to be an issue. There are a few cases where doing it one way or the other may be subtly different, but you'll spot those fairly easily.

Derick Rethans 7:02

So the RFC talks about things that you can do, but also a few things that you cannot do or don't want to do yet. What are these things that partials won't support, or run support yet, at least?

Larry Garfield 7:13

The main thing that it doesn't support is named placeholders. You can pre fill a value or an argument with a named named argument. But not a named placeholder. Those have to be positional. Named placeholders are complicated to implement, and run into a question of, if you provide those in a different order, does that also change the order of the arguments in the partially applied function that you get back in that closure? And there's a good argument to be made that either way is logical. And so we're like, no, does not deal with it, too complicated. We'll just positional only. And you cannot specify an optional arguments either. It's just again, too complicated. Things get too weird. If you have those advanced cases, use our short lambda, that works just fine. If you want to just make a new function that defers to a new function, and change its API in the process, short lambda works fine. And it's still quite short.

Derick Rethans 8:13

I know the RFC talks a little bit about references, but I don't like talking about references. So let's skip that part. In my opinion, they should be removed from the language. But I know we can't.

Larry Garfield 8:22

There's occasionally used for them. But very occasionally.

Derick Rethans 8:25

There's a bunch of technical things that I also want to chat about. And hopefully, Joe, if you want to fill in, I'd be more than welcome to hear your opinions on these things. But the first one is that PHP has this thing called func_get_args. How does that work with these partials? How does that tie in together?

Joe Watkins 8:42

It should mostly behave as if you've invoked the function directly. We don't want there to be a huge discrepancy between. The callee know whether they've been called through partial application or complete application. It should be the same.

Derick Rethans 8:58

That is good to know. I mean, I always like it how things work as people expect them to work, right?

Joe Watkins 9:03

Yeah.

Derick Rethans 9:04

We already have used the dot dot dot operator for variadics. But you're reusing the dot dot dot, or ellipses, as you more eloquently call it earlier. Here again, as well, is that not going to cause issues? Or does that tie in well together?

Joe Watkins 9:18

Well, there's quite a lot of debate about what's the right symbol to use. I think it's dot dot dot, and I think Larry agrees with me. But there's some people who want to stick an extra question mark on the end, which to me looks like it reads zero to one. And to Larry, it looks like an extra character that's just not needed. Other people say it makes sense for them. But if you can type three characters and not four, I mean, you need a really good argument. The arguments that have been put forward so far don't really make very much sense for me. Maybe we should ask that question and it doesn't really matter. In the end, what the syntax is, is if it's a difference between it getting in and not getting in, then we'll just put the extra question mark on there. I don't really have a really good argument to change it like to be like that.

Derick Rethans 10:05

To be honest, to me, it looks like you then have two placeholders.

Joe Watkins 10:09

Yeah.

Derick Rethans 10:10

I don't feel the need for it.

Joe Watkins 10:11

That's also another argument because we've introduced this one symbol, and then this other symbol, and then you put them together. And that's two things. I mean, you can't have one and one equals one.

Derick Rethans 10:20

Fair enough. The RFC does touch on another quite interesting thing, I think, which is constructors, which it also be able to partially apply. But of course, you've mentioned that, that arguments get applied immediately when you do the substitution, when you do the partial application. But of course, the constructor is a bit weird because a constructor runs immediately after an object has been constructed. So how does that work together with partials?

Joe Watkins 10:47

So at first, we made it so like if you invoke a constructor with reflection, and you just invoke it over and over again, it'll invoke it on the same object, you won't get back a new object. It's not the constructor that returns the object, it's the new operator. So first, we had a bit dumb. And we did just like what reflection does. And if you applied to a constructor, you'd get back a closure that just repeatedly invokes the constructor, which is, as Larry called it, quite naive. So we went back and revisited that. And so now it acts like a factory. Every time you invoke the closure return from an application, you get a brand new object, which is more in line with what people expect. And it's also quite cool. It's one of my favourite bits, actually as it turns out.

Derick Rethans 11:31

In my opinion, it also makes more sense than then having an apply to the same object over and over again. Whether I'd like it or not, I don't know yet.

Joe Watkins 11:39

Oh, the other option is traditional constructors to avoid the surprising behaviour. But that would be just a strange.

Larry Garfield 11:45

There are a lot of use cases where you want to take a bunch of values, convert them to objects using an array map, supporting constructors for that makes total sense to me.

Derick Rethans 11:54

And I would probably say, though, that I would prefer not allowing it over it applying over the same object over again. You've touched a little bit on some common cases where you want to use this, do you perhaps have some other ideas where this might be really useful?

Larry Garfield 12:10

So there's three use cases that we think are probably going to be the lion's share. One is to just use the dot dot dot operator. So you have some function or method call, call it with dot dot dot, and that's it. You prefill nothing, which gives you back a closure that is identical in signature to the function or the method that you're applying it to. Everything we've said about functions applies the methods here as well. Which means we now effectively have a new way to refer to a function or a method and make a callable out of it, that doesn't involve just sticking it into a string. You just say, hey, function called dot dot dot, or an arrow bar, parentheses, dot dot dot, parentheses. And now you can turn any function or method into a callable and pass that around. And it's still, it's not wrapped up into the silly array format, it's still accessible to static analysers and refactoring tools. Hopefully, with this, you will never need to refer to a function name using a string ever again, never refer to a method call as an array of object and method. So that that just is not needed any more in the vast majority of cases.

Derick Rethans 13:20

That alone is probably worth having them, maybe.

Larry Garfield 13:23

And Nikita had an RFC that was doing just that, and nothing else. It's kind of a junior version of this. I don't think that's necessary, the full full scope here works, and gives us that. The second use case that I think is going to be common are unary functions. That's functions that take a single argument. More to the point, as I mentioned before, a lot of functions take a callback. And that callback needs a single argument, array map, array filter, some validation routines, a lot of other things like that. So it's now stupidly easy to take any arbitrary function or method and turn it into a single parameter function, which you can then pass as a callback to array map, array filter, all these other tools, and it just becomes really easy to pre fill things that way. The third is the other one I mentioned earlier, if you pre fill all the arguments, and then just put a dot dot dot at the very end, which means zero or more, you now have a function that takes no arguments, but calls the original function you specified with all the arguments you specified. This often the case for default values, where I want to have a default value available, but don't want to take the time to compute it in advance because it might be expensive. Whatever function it is that will determine that default value, I just partially apply that and give it all the arguments and I get back a callable. That creating a callable is dirt cheap, but when I actually need that value, I can then call it at that time, but it won't actually get called unless I need it. That's another use case that we expect to be common. There are no doubt others that we haven't thought of, or that will be less common, but still useful. I think this will probably replace a large chunk of the use cases for short lambdas. Not because short lambdas are bad, they're wonderful. But so many of them convert a function to a simpler function. And this gives us an even more compact, more readable syntax for that, with even less extra symbols and flotsam around it.

Derick Rethans 15:24

I saw, hopefully as a joke, saying that, instead of using the question mark, we should use dollar sign dollar sign, and then we should call the token name T_BLING.

Larry Garfield 15:36

This RFC actually has a storied history. Several years ago, Sara Golemon had proposed porting the pipe operator from Hack to PHP. The pipe operator is an operator available in a lot of different languages that lets you string together a series of functions. So you pass a function, pass an argument into one function, its results you pass to the next function, its results, you pass the next function and so on, which is a good case for unary functions. In Hack's syntax, they don't use a function on the right hand side, they use an arbitrary expression, and then dollar dollar as a placeholder for where to put the value from the left hand side from the previous step. It's the only language that does that.

Derick Rethans 16:20

The other language that does it is bison.

Larry Garfield 16:23

Or Bison also does that style of?

Derick Rethans 16:25

It does something weird like that, yeah. Have a look at the grammar file.

Larry Garfield 16:29

I've looked in there. It's scary. So at the time, she didn't actually put an implementation in for it. But there was some discussion about it. I joked that if she wanted to do that, she should call it T_BLING. And she thought it was hilarious, but never went anywhere. A year ago, I started working on a pipe operator RFC that did just the pipe part, but used a callable on the right hand side, instead of an expression, more like F#, and Haskell, and other languages that have a pipe operator. And their main response to that was, we'd like this, this is cool, except that just using short lambdas on the right all the time to make unaries is too ugly. We want partial application first. So I spent a while trying to bribe someone with more experience and knowledge than me to work on partial application. I tried bribing Ilya Tovolo, to do so by working with him on enumerations. And we got enumerations in, but he doesn't have the time to work on partial application. Levi and Paul had already written an RFC for partial application that had no implementation. It's just a skunkworks, essentially. Then a few weeks ago, Joe pops up and starts working on an implementation for partials. And I, to this day, don't know what interested him in it. But I'm very happy about this fact. So as we updated the RFC, I knew that people want a bike shed about syntax. So I threw that in as a joke. I don't think we're actually going to do that. It's just a little inside reference that is now no longer inside.

Derick Rethans 17:56

Joe what made you work on partials, then?

Joe Watkins 17:58

It's interesting to write. I've had my fun whether it gets in or not.

Derick Rethans 18:02

Sometimes that's the case, right? So just working on this is all the fun.

Larry Garfield 18:06

Sometimes it's fun to just run down rabbit holes for the heck of it. And sometimes really cool things can come out of that sometimes.

Derick Rethans 18:12

At some point, I might have to implement support for partials like I have for closures in Xdebug as well. Because at some point, people might want to debug these things. So I'm a little bit interested in how do these the closures that it generates? Where does it store the already applied arguments?

Joe Watkins 18:29

So partials have the same binary struct up to this point or of the closure, and then after that there's some extra fields.

Derick Rethans 18:36

Would they still have the names?

Joe Watkins 18:39

No, because named arguments aren't actually named, that information is lost. By the time we've got them, we don't have any name information. We've only got their correct position, according to the call that was made.

Derick Rethans 18:50

And every argument that hasn't been filled and doesn't have a special placeholder in there, or does it keep track of which ones have been filled in?

Joe Watkins 18:56

We've got two special placeholders internally, you won't see as undef or null or anything.

Derick Rethans 19:02

Okay, that's good to know. What has the reaction been so far?

Larry Garfield 19:05

Slightly positive. There were a lot of discussions early on about do we support argument reordering? And should it use a single placeholder or two separate placeholders? Originally, we had one and realized after a while, that doesn't actually work. There're use cases where that will be confusing. Overall, the feedback has been quite positive, and I fully expect that to pass. Really the only question people are still debating about at this point is ellipsis versus ellipses question mark.

Joe Watkins 19:34

Yeah, I think the first version of the RFC was quite well received. Someone said we could document it as to make a partial sprinkle or question mark over it and hope for the best.

Derick Rethans 19:44

Oh, that's good to hear. With feature freeze coming not pretty soon now. When do you think you're putting this up for a vote?

Larry Garfield 19:51

Probably in the next couple of days. The only question I think is whether we include a second question for which variadic placeholder to use, which syntax/ Or if we just say it's dot dot dot, go away. Other than that it should go to a vote probably before this episode airs.

Derick Rethans 20:06

Thank you very much, both of you for taking the time to me today to talk about partials.

Larry Garfield 20:11

Thank you again Derick, hopefully see you once more on this season.

Joe Watkins 20:15

Thanks Derick, see you soon.

Derick Rethans 20:21

Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next time.


PHP Internals News: Episode 88: Pure Intersection Types

PHP Internals News: Episode 88: Pure Intersection Types

In this episode of "PHP Internals News" I talk with George Peter Banyard (Website, Twitter, GitHub, GitLab) about the "Pure Intersection Types" RFC that he has proposed.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:14

Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is Episode 88. Today I'm talking with George Peter Banyard about pure intersection types. George, could you please introduce yourself?

George Peter Banyard 0:30

Hello, my name is George Peter Banyard. I work on PHP code development in my free time. And on the PHP Docs.

Derick Rethans 0:36

This RFC is about intersection types. What are intersection types?

George Peter Banyard 0:40

I think the easiest way to explain intersection types is to use something which we already have, which are union types. So union types tells you I want X or Y, whereas intersection types tell you that I want X and Y to be true at the same time. The easiest example I can come up with is a traversable that you want to be countable as well. So traversable and countable. Currently, you can do intersection types in very hacky ways. So you can either create a new interface which extends both traversable and countable, but then all the classes that you want to be using this fashion, you need to make them implement the interface, which might not be possible if you using a library or other things like that. The other very hacky way of doing it is using reference and typed properties. You assign two typed properties by reference, one being traversable, one being countable, and then your actual property, you type alias reference it, with both of these properties. And then my PHP will check: does the property respect type A those reference? If yes, move to the next one. It doesn't respect type B, which basically gives you intersection types.

Derick Rethans 1:44

Yeah, I saw that in the RFC. And I was wondering like, well, people actually do that?

George Peter Banyard 1:49

The only reason I know that is because of Nikita's slide.

Derick Rethans 1:51

The thing is, if it is possible, people will do it, right. And that's how that works.

George Peter Banyard 1:56

Yeah, most of the times.

Derick Rethans 1:57

The RFC isn't actually called intersection types. It's called pure intersection types. What does the word pure do here?

George Peter Banyard 2:05

So the word pure here is not very semantic. But it's more that you cannot mix union types and intersection types together. The reasons for it are mostly technical. One reason is how do you mix and match intersection types and union types? One way is to have like union types take precedence over intersection types, but some people don't like that and want to explicit it grouping all the time. So you need to do parentheses, A intersection B, close parentheses, pipe for the union, and then the other type. But I think the main reason is mostly the variance, like the variance checks for inheritance are already kind of complicated and kind of mind boggling.

Derick Rethans 2:44

I'm sure we'll get into the variance rules in a moment. What is it actually what you're proposing to add here. What is the syntax, for example?

George Peter Banyard 2:52

So the syntax is any class type with an ampersand, and any other class type gives you an intersection type, which is the usual way of doing and.

Derick Rethans 3:01

When you say class types, do you also mean interfaces?

George Peter Banyard 3:04

Yes, PHP has a concept of class types, which are mostly any class in any interface. There's also a weird exception where parent and self are considered class types, but those are not allowed.

Derick Rethans 3:20

Okay, so it's just the classes that you've defined and the class that are part of the language but not a special keywords, self and parent and static, I suppose?

George Peter Banyard 3:28

Yes, the reason for that is standard types are not allowed to be part of an intersection, because nothing can be an integer and a string at the same time. Now, there are some of the built in types, which can be kind of true. You could have a callable, which is a string, because callables can be arrays, or can be a closure. But that's like very weird and not very great. The other one is iterable. If when you expand that out, you get redundant types, which we can talk about later. And the final thing is parent, self, and static, just makes for some very weird design questions, in my opinion, like, if you ask for something to be an intersection with itself, you basically can only enforce conditions on subclasses. You have a class and you say: Oh, I want it to return self, but also be countable for some reason, but I'm not countable. So if you extend me, then you need to be countable, but I'm not. So it's very weird. parent has kind of the very same weird semantics where you can ask a parent, but it's like, if the base class doesn't support it, and you ask for a parent to be an intersection, then you basically need the child to implement the interface and then a child to return the first child. If you do that main question. Why? Because I don't see any good reasons to do it. And it just makes everything harder.

Derick Rethans 4:40

You've only added for the sake of completeness instead of it being useful. Let's move on birds. You've mentioned which types are supported, which is class names and interface names. You already hinted a little bit at redundant types. What are redundant types?

George Peter Banyard 4:56

Currently, PHP already does that with union types. If you repeat the type twice in a union, you'll get a compile error. This only affects compiled time known aliases. If you use a use statement, then PHP knows that you basically using the same type. However you use a runtime alias, then it can't detect that.

Derick Rethans 5:13

A runtime alias, what's that?

George Peter Banyard 5:15

So if you use the function class_alias.

Derick Rethans 5:16

It's new to me!

George Peter Banyard 5:18

it technically exists. It also doesn't guarantee basically that the type is minimal, because it can only see those was in its own file. For example, if you say I want A and B, but B is a child class of A, then the intersection basically resolves to only B. But you can only know that at runtime if classes are defined in different files. So the type isn't minimal. But if you do redundant types, basically, it's a easy way to check if you might be typing a bug.

Derick Rethans 5:46

You try to do your best to warn people about that. But you never know for certain.

George Peter Banyard 5:51

You never know for certain because PHP doesn't compile everything into like one big program like in check. Static analyser can help for that.

Derick Rethans 5:59

Let's talk a little bit about technical aspects, because I recommend that implementing intersection types are quite different from implementing union types. What kind of hacks that you have to make in a parser and compiler for this?

George Peter Banyard 6:11

Our parser has being very weird. The parsing syntax should be the same as union types. So I just copy pasted what Nikita did. I tried it. It worked for return types without an issue. It didn't work with argument types, because bison, which is the tool which generates our parser, was giving a shift reduce conflict, which basically tells: Oh, I got two possible states I can go in, and I don't know which branch I need to go, because the PHP parser only does one look ahead. Because it was conflicting, the ampersand, either for the intersection type or for to mark a reference. Normally, if the paster is more developed, or does more look ahead, it is not a conflict. And it shouldn't be. Ilia managed to came up with this ingenious idea, which is just redefine the ampersand token twice and have very complicated names, and just use them in different contexts. And bison just: now I have no issue. It is the same token, it is the same character. Now that you have two different tokens it manages to disambiguate, like it's shift produce. So that's a very weird.

Derick Rethans 7:17

I'll have a look at what that actually does, because I'm curious now myself. Beyond the parser, I think the biggest and most complicated part of this is implementing the variance rules for these intersection types. Can you give a short summary of what a variance rules are, and potentially how you've actually implemented them?

George Peter Banyard 7:38

Since PHP seven point four, return types and up covariant, and parameter types are contravariant. Covariant means you can like restrict, we can be more specific. And contravariance means you can be broader or like more generic. Union types already gives some interesting covariance implications. Usually, you would think, well, a union is always broader than a single type, you say: Oh, I want either a traversable or accountable, it seems that you're expanding the type sphere. However, a single type can have as a subtype, a union type. For example, you say,:Oh, my base type is a Class A, and I have two child classes, which are B and C. I can type covariantly that I want either B or C, because B or C is more specific than just A. That's what union types over there allows you to do. And the way how it's implemented. And how to check for that is you traverse the list of child types, and check that the child type is an instance of at least one of the parents types. An intersection by virtue of you adding constraints on the type itself will always be more specific than just a single type. If you say: Oh, I want a class A, then more specifically, so I want something of class A and I want it to be countable. So you're already restrict this, which gives some very interesting implications, meaning that a child type can have more types attached to itself than a parent type. That's mostly due how PHP implements its type system, to make the distinctions, basically, I've added the flag, which is either this is a union, meaning that you need to check it is part of one, or it's an intersection. The thing with intersection types is that you need to reverse the order in how you check the types. So you basically need to check that the parent is at least an instance of one of the child types, but not that none of the child types is a super type of the parent type. Let's say you have class C, which extends Class B and Class B extends Class A. If I say let's say my base type is B to any function, and I give something which is a intersection T, any interface, this would not be a valid subtyping relation to underneath B. Because if you looked it was a Venn diagram in some sense, you've got A which is this massive sphere, you've got B which is inside it, and C which is inside it. A intersection something intersects the whole of A with something else, which might also intersect with B in a subset, but it is wider than just B, which means like the whole variance is very complicated in how you check it because you can't really reuse the same loop.

Derick Rethans 10:13

I can't imagine how much more complicated this gets when you have both intersection and union types in the same return type or parameter argument type.

George Peter Banyard 10:22

One of the primary reasons why it's currently not in the RFC, because it is already mind boggling. And although I think it shouldn't be that hard to like, add support for it down the line, because I've already split it mostly up so it should be easy to check: Oh, is this an intersection? Is this a union? And then you need to branch.

Derick Rethans 10:42

Luckily because standard types aren't included here, you also don't really have to think about coercive mode and strict mode for these types. Because that's simply not a thing.

George Peter Banyard 10:50

That's very convenient.

Derick Rethans 10:52

Is the future scope to this RFC?

George Peter Banyard 10:54

The obvious future scope is what I call composite types, is you have unions and intersections available in the same type. The main issue is mostly variance, because it's already complicated, adding more scope to it, it's going to make the variance go even harder. I think with most programming languages, the variance code is always complicated to read. While I was researching some of it, I managed to hit a couple of failures, which where with I think was Julia and the research paper I was it was just like focusing on a specific subset. And like, basically proving that it is correct. It's not a very big field. Professors at Imperial, which I've talked to, have been kind of helpful with giving some pointers. They mostly work with basically proper languages or compiled languages, which have this whole other set of implications. Apparently, they have like a bunch of issues about how you normalize the types like in an economical form, to make it easier to check. Which is probably one of the problems that will need to be addressed, when you get like such a intersection and union type. First, you normalize it to some canonical form, and then you work with it. But then the second issue is like how do you want the composite types to actually be? Is it oh, you have got parentheses when you want to mix and match? Or can you use like union precedence? I've heard both opinions. Basically, some people are very dead against using Union as a precedent.

Derick Rethans 12:14

My question is going to be, is this actually something people would use a lot?

George Peter Banyard 12:21

I don't think it would be used a ton. The moment you want to use it, it is very useful. One example is with the PSRs, the HTTP interfaces. Or if you want the link interface. Combining these multiple things gets it convenient. One of the reasons why I personally wanted as well, it's for streams. So currently, streams don't have any interface, don't have any classes. PHP basically internally checks when you call like certain string methods. For example, if you try to seek and you provide a user stream, it basically checks if you implement a seek method, which should be an interface. But you can't currently do that. Ideally, you would want to stream maybe like a base class, instead of having like a seekable stream, and rewindabe stream, or things like that. You basically just have interfaces. And then like if somebody wants a specific type of stream, just like a stream, which is seekable, which is rewindable. And other things. We already have that in SPL because there's an iterator. And we have a seekable iterator interface, which basically just ask: Oh, this is there's a seek method. I think it depends how you program. So if you separate the many things into interfaces, then you'll probably use intersections types a lot. If you use a maybe a more traditional PHP code base, which uses union types a lot. Union types are like going to be easier. And you want to reduce that.

Derick Rethans 13:32

Would you think that lots of people already use union types because it's pretty new as well. Isn't it?

George Peter Banyard 13:38

Union types are being implemented in various different libraries. PSRs are updating the interfaces to use union types. One use case, I also have a special method, which was taken the date, it takes a union of like a DateTime interface, a string or an integer. Although intersections types are really new, you hear people when union types were being introduced, you heard people saying, I would promote bad cleaning habits, you shouldn't have one specific type. And if you're using a union, you have a design issue. And I had many people complaining to me why and intersection types of see? Why they haven't intersection types being introduced first, because intersection types are more useful. But then you see other people telling us like, I don't see the point in intersection types. Why would you use an intersection type, just use your concrete class, because that's what you're going to type anyway.

Derick Rethans 14:21

I can give you a reason why union types have implemented first, over intersection types, I think, which is that it's easier to implement.

George Peter Banyard 14:28

It's easier to implement. And it's more useful for PHP as a whole, because PHP functions accepts a union or return a union. Functions return false for error states instead of null. It makes sense why union types were introduced first, because they are mostly more useful within the scope of what PHP does.

Derick Rethans 14:46

Do you think you have anything else to add about intersection types? At the moment, it's already up for voting, when is that supposed to end?

George Peter Banyard 14:54

So the vote is meant to end on the 17th of June.

Derick Rethans 14:57

At the moment I see there's 15 votes for and two against so it's looking good. What's been your most pushback on this? If there was any at all?

George Peter Banyard 15:05

Mostly: I don't see the point in it. However, I do think proper reasons why you don't want it, compared to like some other features where it's more like have thoughts on what you think design wise. But it is undeniable that you you add complexity to the variance. And to the variance check. It is already kind of complicated. I have like a hard time reading it initially. There's the whole parser hackery thing, which is kind of not great. It's probably just because we use like a restricted parser because it's faster and more efficient.

Derick Rethans 15:36

I think I spoke with Nikita about parsers some time ago and what the difference between them were. If I remember which episode it was all the to the show notes.

George Peter Banyard 15:44

And I think the last reason against it is that it only accepts pure intersections. You could argue that, well, if you're adding intersections, you should add the whole feature set. It might impact the implementation of type aliases, because if you type alias T to be a union of A and B, and then you use type T in an intersection, you basically get a mixture of unions and intersections, that you need to be able to work with. The crux of this whole feature is the variance implementation. And being able to rationalize the variance implementation and been to extend it, I think it's the hardest bit.

Derick Rethans 16:18

I guess the next thing still missing would be type aliases, right? Like names for types, which you can't define just yet, which I think you also mentioned in the RFC is future scope.

George Peter Banyard 16:29

Yeah.

Derick Rethans 16:30

Thank you, George, for taking the time today to talk to me about pure intersection types.

George Peter Banyard 16:36

Thanks for having me on the show.

Derick Rethans 16:41

Thank you for listening to this installment of PHP internals news, the podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time.


PHP Internals News: Episode 87: Deprecating Ticks

PHP Internals News: Episode 87: Deprecating Ticks

In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the "Deprecating Ticks" RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:14

Hi I'm Derick, welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is episode 87. Today I'm talking with Nikita Popov about a much smaller RFC this time: Deprecating Ticks. Nikita, would you please introduce yourself.

Nikita Popov 0:34

Hi Derick, I'm Nikita, and I'm working on PHP core development on behalf of JetBrains.

Derick Rethans 0:40

Let's jump straight into what this RFC is about, and that's the word ticks. What are ticks?

Nikita Popov 0:46

Ticks are a declare directive,. You write declare ticks equals one at the top of your file, and then PHP we'll call a tick function after every statement execution. Or if you write ticks equals two, then as we'll call it the function after every two statement executions.

Derick Rethans 1:05

Do you have to specify which function that calls?

Nikita Popov 1:08

Of course, so there is also a register tick function and unregister tick function and that's how you specify the function that should be called rather the functions.

Derick Rethans 1:17

How does this work, historically, because the RFC talks about the change being made in PHP seven?

Nikita Popov 1:22

Technically ticks work by introducing an opcode after every statement that calls the tick function depending on current count. The difference that was introduced in PHP seven is to what the tick declaration applies. The way PHP language semantics are supposed to work, is that declare directives are always local. The same way that strict types, only applies to a single file, ticks should also only apply to a single file. Prior to PHP seven, it didn't work out way. So if you had declare ticks, somewhere in your file, it would just enable ticks from that point forward. If you included the different file or even if the autoloader was triggered and included a different file that one would also make use of ticks. That was fixed in PHP seven, so now it is actually file local, but that also means that the ticks functionality at that point behaviour became, like, not very useful. Because usually if you want to use tics you actually want them to apply it to your whole codebase. There are ways around that. I'm afraid to say that people have approached me after this RFC and told me that they actually do that. The way around that is to register a stream wrapper. It's possible in PHP to unregister the file stream wrapper and register your own one, and then it's possible to intercept all the file includes and rewrite the file contents to include the declare ticks at the top of the file. I do use that general mechanism for real things in other places, but apparently people actually use that to like instrument, a whole application with ticks, and essentially restore the behaviour we had in PHP 5.

Derick Rethans 3:03

What was the intended use case for ticks to begin with?

Nikita Popov 3:07

Well I'm not sure what was the intended use case, but at least it was the main use case, and that's signal handling. In the PCNTL extension allows you to register a signal handler, and when the signal arrives, we can't just directly call that signal handler, because signals are only allowed to call functions without that our async signal safe. Which excludes things like memory allocation, and a lot of other things that PHP uses. What we do instead is we only set the flag that okay signal has arrived and then we have to actually run the signal handler at some later point in time. In PHP five, that worked using ticks. You declare ticks, and the PCNTL extension registered the tick handler, and then after this flag was set, it would execute your callback on the next tick. In PHP seven, an attentive mechanism was introduced, that is based on virtual machine interrupts. Those were originally introduced for time-out handling, because there we have a similar problem, that when timeout arrives, we might be in some kind of inconsistent state, like the middle of the allocator right now, and if we just bail out at that point, we are likely to see crashes down the road. So that was a significant problem in PHP five. PHP seven changed that. We now set an interrupt flag on timeout, and then the virtual machine checks this flag at certain points. The interrupt flag is not checked after every instruction, but only, like, just often enough to make sure that it's checked, at some point. So that you can't like go in an infinite loop, that ends up never checking. These points are basically function calls, and jumps that go higher up in the function, PCNTL signals can now use the same mechanism. If you call PCNTL async signals true, then those will also set the interrupt flag, and execute the signal handler on the next opportunity. The next time the interrupt flag is checked. The nice thing about that is that it's essentially free. I mean we already, we already have to do these checks for the interrupt like anyway, adding the handling for PCNTL signals doesn't add any cost on top. Unlike ticks, which have to be like executed on every instruction or at least regularly, and that does add significant cost.

Derick Rethans 5:28

Execution time itself because it's an opcode that needs to be executed.

Nikita Popov 5:32

Exactly.

Derick Rethans 5:33

So what are you proposing to do but the ticks in PHP eight one then?

Nikita Popov 5:36

I want to deprecate that. So both the declared directive itself, and the register tick function, unregister tick function.

Derick Rethans 5:44

How could users emulate the same behaviour as ticks allows them to do so now?

Nikita Popov 5:49

That's a good question. As I mentioned, if the use case is, use case of ticks was signal handling, then by using async symbols. If it was something else, then you have a problem. My assumption when writing this RFC was basically that signal handling was really the main remaining use case of ticks, because other use cases require this kind of you know stream wrapper instrumentation, and I didn't expect that people will be crazy enough to use something like that in production.

Derick Rethans 6:21

Hopefully they catch these rewritten files?

Nikita Popov 6:23

Probably yeah. I think it's possible to make this integrate with opcache. If you use it for other purposes, then, I don't think there is a really good replacement. So I think what they use it for is some kind of well instrumentation, so profiling, memory profiling, for example, and the alternative there of course is to use a tool that is appropriate for that job, for example, Xdebug contains a profiler, but of course it is not a production profiler, but I think there are also production profilers.

Derick Rethans 6:54

As far as I know all the production or APM solutions. They do this on their own without having to use sticks. They don't need any user land modifications.

Nikita Popov 7:03

Yeah, definitely. All the APM solutions support this, they use internal handlers.

Derick Rethans 7:08

Because it's actually removing functionalities that some people use, what's the reaction been to removing this functionality?

Nikita Popov 7:14

Well on the mailing list at least positive, but as I mentioned at least some people have like pointed out on the pull request that they are using the functionality.

Derick Rethans 7:23

Enough in such a way to sway for not deprecating them? What is the benefits of getting rid of ticks, if you don't use them?

Nikita Popov 7:31

That's, I think the thing, that there is not really a big benefit to getting rid of them. Like they don't add a lot of technical complexity to the engine. They're pretty simple in that sense. I haven't seen those responses. I'm kind of rolling a bit unsure if we should really remove them, because you could argue that well they don't really hurt anyone. I do have to say that I think all the things that people use sticks for, all the cases I have heard about, and all of those cases ticks are not the right way to solve the problem. They are not the right way to solve the signal handler problem, they are not the right way to solve the profiling problem. And the other one I heard is also they're not the right way to solve the heartbeat problem, to make sure a service stays connected. While people do use them I think they use them for questionable purposes.

Derick Rethans 8:24

Developers, if they're using something to rewrite the PHP file to introduce ticks, they can also technically rewrite a file to introduce calls to their own functions, after every statement.

Nikita Popov 8:34

Yes, I actually have a very nice PHP fuzzing project that rewrites PHP files to introduce instrumentation functions at certain points. That needs a lot more control than ticks, because it's interested in branching statements in particular. That is definitely also possible, but it's kind of even more crazy than just adding ticks. If you're doing it like this, I think, if we want to keep ticks, then we should change ticks from a declare directive to a ini_set, because this kind of rewriting of files to introduce takes that's like not a great solution. On the other hand, that does mean that if you are, I don't know a library, implementing some code and expecting that, you know, it just runs normally, then someone can with by enabling an ini setting will suddenly run code in the middle of your library file that's like essentially any point. So enabling ticks us a major behaviour change, that's something we really don't like to have in ini settings which is I guess also, why does it declare in the first place, because that limits the scope. And you have to go out of your way if you want to not limit it using this rewriting hack. So I'm not really sure ultimately what to do here.

Derick Rethans 9:44

Are you thinking of bringing this up for vote before PHP eight dot one's feature freeze?

Nikita Popov 9:49

If I decide to go for it, then definitely before. I'm just not completely sure on this topic yet.

Derick Rethans 9:55

it'd be interesting to, to hear what other people think about removing this. I have no opinion about this. Other features I do but in this case, I'm happy with them being there, I'm happy with them not being there, because it's something I'm using myself. In any case, thank you for going through this RFC with me today, and we'll see what happens.

Nikita Popov 10:14

Thanks for having me, Derick.

Derick Rethans 10:18

Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug and debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time.


PHP Internals News: Episode 86: Property Accessors

PHP Internals News: Episode 86: Property Accessors

In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the "Property Accessors" RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:14

Hi I'm Derick. Welcome to PHP internals news, a podcast dedicated to explain the latest developments in the PHP language. This is episode 86. Today I'm talking with Nikita Popov about his massive property excesses RFC. Nikita, would you please introduce yourself?

Nikita Popov 0:32

Hi Derick, I'm Nikita, and I do work on PHP core development, on behalf of JetBrains.

Derick Rethans 0:39

This is probably the largest RFC I've seen in a while. What in one sentence, are you proposing to add to PHP here?

Nikita Popov 0:46

I would say it's an alternative to magic get and set, just for one specific property instead of all of them. That's the technical side. Maybe I should say something about the like motivation behind it, which is that since PHP seven four, we have type properties, that at least for me personally with that feature, the need to have this typical pattern of private property for storage, plus a public getter and setter methods, the main motivation for that has kind of gone away, because we can now use types to enforce any contracts on value. And now these getter and setter methods most if you like boilerplate. So the idea with accessors, at least my idea with accessors is that you really shouldn't use them. You should just have them as a backup option. You declare a public property in your class, and then maybe later, years later, it turns out that okay, that property actually requires additional validation. And right now if you have a public property, then you don't really have a good way of introducing that. Only way is to either break the API contract by converting the property into getter/setter methods where you can introduce arbitrary code, or by using magic get/set, which is definitely possible and persist the API contract, it's just fairly ugly.

Derick Rethans 2:09

You changes the public property that people could read into a private one. And because it's private, the set and get metric methods are being called.

Nikita Popov 2:18

Exactly.

Derick Rethans 2:19

This RFC is titled Property accesses, how do these improve on the situation?

Nikita Popov 2:24

So I think there are really two fairly orthogonal parts to this RFC. The first part is implicit accesses that don't have any custom behaviour, and just allow controlling the behaviour of properties a bit more precisely. In particular, the most important part is probably the asymmetric visibility, where you have a property that's publicly readable, but can only be set from within the class. So public read/ private write. I think that's a, maybe the most common requirement. The second part is where you can actually introduce some custom behaviour. So where you can say that okay, the get behaviour for this property looks like this, and the set behaviour, it looks like this. Which is essentially exactly the same as what magic get/set does, just for a single property.

Derick Rethans 3:10

For example, when you then do set, or you can add additional validation to it.

Nikita Popov 3:14

Exactly. Originally, you had a simple public property, then you can add a setter that checks okay this string cannot be empty.

Derick Rethans 3:23

Okay, what it's the syntax that you're proposing?

Nikita Popov 3:26

I went with these essentially the same syntax that's being used in C#. Looks like you write public foobar, and then you have this sort of semi colon you have a code block. And this code block contains two accessors, so then you have something like get, and another code block that specifies the get behaviour, and set, and the code block that specifies the set behaviour and so on.

Derick Rethans 3:52

The RFC talks about implicit and explicit implementations of these getter and setter accessors. What is the difference between them and how does it look different in syntax?

Nikita Popov 4:03

Yeah so the difference is, either you can write just get semi colon, set semicolon, that's an implicit implementation, or you actually specify a code block with real custom behaviour. To do the implicit implementation, you're saying that this is really a normal property, and PHP automatically manages the storage for you, is that you have this more fine grained control over how it works. Namely what you can do is you can say that you have get and private set. But that's a property that's publicly read only and internally writeable. You can write just get without set, in which case it's a real read only property both publicly and privately, or to be more precise, it's an init once property so you can assign to it once.

Derick Rethans 4:52

How do you keep track of the init once?

Nikita Popov 4:53

It's same mechanism as for Type Properties, where we distinguish between an initialized and an uninitialized property. You can assign to an uninitialized property, but you can't assign to an initialized one, if it's read only. The only maybe problem there is that this mechanism, requires that the property actually is uninitialized to start with, which means that for accessors you don't have any default values. To say there is no implicit default value, no implicit null value. If you want to have a default value the same as with type properties you have to specify it explicitly. Specifying a default value really only makes sense if the property is both readable and writable. For Read Only properties, if you specify the default then you will you can change that.

Derick Rethans 5:37

You have basically have created a constant.

Nikita Popov 5:39

Yes, it is essentially a constant.

Derick Rethans 5:41

You mentioned already, PHP seven four introduced type properties. How do these types interact with the setter and getter accessors?

Nikita Popov 5:50

I would say in the obvious way. The getter is required to return type of property, modulo the usual weak typing conversions, and the setter also checks before it's called whether the passed value matches the type or not. But enforces that matches the type.

Derick Rethans 6:08

This does mean that if you provide an explicit implementation for the set accessor, you also need to specify the parameter name?

Nikita Popov 6:15

No, or you can specify the parameter name, and if you don't then that's just passed in as the value variable. It's also inspired by how C# and Swift do it. I mean there are some possible variations here we could always require an explicit name, some people for that, or I also heard that some people would like to have the name of this implicit variable match the name of the property, instead of always being just value.

Derick Rethans 6:41

Would you have to specify the type though?

Nikita Popov 6:43

You wouldn't have to and you're actually not allowed to. So the accessor implementation is somewhat strict about not allowing you to do anything that would be redundant because otherwise, you know, there are quite a lot of extra things you could be adding everywhere.

Derick Rethans 6:56

That's the same way as marking a property as private. And then the accessors as private as well. Right?

Nikita Popov 7:03

Yeah exactly. So, then that will also say: if the property is already private you can't, again say that the accessors also private.

Derick Rethans 7:11

I think that's the wise thing, otherwise people go overboard with adding private and final and whatever everywhere anyway right.

Nikita Popov 7:18

One could argue that it's really not our business and this is a coding style question, but you know it's better to not leave people, with the option of doing stupid things.

Derick Rethans 7:28

I saw in the RFC that it is also possible to use references with the get accessor. Does this complicated implementation and the idea of this RFC, a lot, or just a little?

Nikita Popov 7:39

I think the important context to keep in mind here is that we already have magic get set, and the accessors are, like, largely based on their semantics. Magic getters already have this distinction between returning by value and returning by reference. The by reference return value is primarily useful for two cases. One and this is really the important one, is if you're working with arrays, any write operation on an array like setting an element or appending an element, those require that the getter returns by reference, because PHP will actually do the modification on the reference. Because some people asked about that. Why can't we just like get the array using the getter, then make the change and then assign back using the setter. That would theoretically work, but it would be extremely inefficient, and the reason is that this breaks PHP's copy on write mechanism. If the error is returned from the getter, then we have one array inside the property. And we have one copy of the array inside the property, and as the return value. Then we change the return value and the resource is now shared, we actually have to copy the whole array, and then we assign it back. So effectively what we do is we copy the array, we do single element change, and then we copy the array, we do a single element change and then we destroy the old array. That works in theory, but it's so inefficient that we would not want to promote this kind of usage.

Derick Rethans 8:42

The way around is of course, is having an implicit methods on the class to make this change to the array itself right?

Nikita Popov 9:10

That would be another option. Problem is that you will need a lot of methods, I mean it's not just a matter of setting a single element or unsetting an element, but you can also set like a deep element where you're not modifying the outermost array but, like, a multi dimensional array. You would actually have to pass through that information somehow as well. I don't think there is a simple solution to that problem beyond the reference based solution that we currently use.

Derick Rethans 9:34

I saw people arguing about not bothering with references in this new implementation at all, but I think you've now made a good case for keeping them.

Nikita Popov 9:42

Effectively not bothering with references just means not supporting that array use case. Which might be, maybe a reasonable limitation, especially if we like make a distinction and supported for the implicit accessor case where we can, you know, do internal magic to support that and not support it in the explicit accessors case. I mean, people were arguing that this reduces the complexity of the proposal, but it kind of also increases the complexity because now we are doing something else for the accessors and we're doing for the magic get/set, where we already have this established mechanism. I'm not really convinced by that.

Derick Rethans 10:20

And I also think it creates inconsistencies in the language itself because it does something different with an implicit or explicit accessor, as well as it being different between the original underscore underscore get magic method as well.

Nikita Popov 10:34

It's not a secret that I'm not a big fan of references, and I would certainly love to get rid of them, but it's a hard problem, and this array modification behaviour for magic get or for get accessors is certainly a large part of that problem, and I just don't have a good solution for it.

Derick Rethans 10:52

I don't either. The RFC also goes into great detail about inheritance and variance. Would you have a few words on that?

Nikita Popov 11:00

I think mostly inheritance works like inheritance does for methods, at least that's how it's supposed to work. Of course there are some interactions, because you can for example mix real properties and accessor properties. In which case, if you have parent accessor property, you can always replace it with a normal simple property, because normal properties they support all operations that accessor properties do. What you can't do is the other way around. If you have a parent normal property, then you can't replace that with an accessor property. And reason is that it does have some limitations. Not a lot, but there are some limitations. One of them is related to references, I mean, we're already talking about this topic. What the by reference get allows is taking a reference to the property, so you can do something like a reference equals the property. What you can't do is the other way around the property reference equals something else. So you can't assign a new reference into the property, that just doesn't work on a pretty fundamental level, because it would require an additional set handler for set by reference. As we don't particularly love references, adding a new mechanism to support that is not a very popular choice.

Derick Rethans 12:20

Variance wise, I guess, the same rules apply as for normal properties and property types?

Nikita Popov 12:27

Approximately. Properties are apparently invariant, so you can't change the type or I mean you can change it but it has to be an equivalent type. If you have a read only property, with only a getter, then the implementation makes the type covariant, which means you can use a smaller type in the child class. This is similar to how if you have a getter method, you could also give it a smaller type in the child class. The converse case, if you have a property that can only be set, then the type is contravariant, you can have larger type in the child class, though I should say that properties that can only be set are somewhat odd and really only supported for the sake of completeness, so maybe it might be worthwhile to drop the type specific behaviour there, because a set only property should already be really rare, and then set property with a contravariant inheritance that's like a edge case of an edge case.

Derick Rethans 13:24

Would it even make sense to support set only properties?

Nikita Popov 13:27

Not sure. So for the C#, implementation, I think they don't support this and there is a StackOverflow question about that, and people try to convince their, that they should support this, that the are really use cases. Currently the imagined use cases are along the lines of injecting values into a class, so using setter injection, just that now it's property based setter injection. Okay, I'll be honest I think it doesn't make sense.

Derick Rethans 13:55

To be fair, I don't think either. It would reduce the length of the RFC a little bit.

Nikita Popov 14:00

A little bit, yes.

Derick Rethans 14:01

Can you say a few words about abstracts, traits, private accessors shadowing and things like that. So a lot of complicated words, maybe you, you can distil that into something slightly simpler.

Nikita Popov 14:12

Well I think actually abstract properties are worth mentioning. In particular, the fact that you can now specify properties inside interfaces. If you have public properties, then it makes sense to have them really on the same level as public methods, so they are part of the API contract, and as such should also be supported in interfaces. Typically what the RFC allows is, you can't specify a simple property in the interface, but you can specify an accessor property, which tells you which operations have to be supported. So you can't have a property declaration that says, it just has a get accessor, or it has get and set. The implementation of course can always implement more, so if the interface requires get, then you can implement both get and set, but it has to implement at least get, either through an accessor offer another property. I think in most cases implementation will just be a normal property.

Derick Rethans 15:03

Because a normal property would implement an implicit get already anyway?

Nikita Popov 15:07

Yeah.

Derick Rethans 15:08

How do property accessors tie in, or integrate with constructor property promotion?

Nikita Popov 15:13

They are supported and promotion with the limitation that it's only implicit accessors. If you use constructor promotion, then you can specify your read only property in there, or property that is Public Read/ private write. You cannot specify a property with complex behaviour in there. This is mainly because it would mean that you embed large code blocks into the constructor signature, which is I think, pushing the limits of shorthand syntax, a bit. Like there is nothing fundamental that will prevent it, it's more a question of style.

Derick Rethans 15:50

The RFC talks a little bit about how, or rather what happens if you use foreach, var_dump, or an array cast on properties with explicit accessor. What are the restrictions here? Is something chasing from normal standard properties like we currently have.

Nikita Popov 16:03

I don't think so. So here is once again the case where we have this distinction between the implicit accessors, which are really just normal properties with limitations. So those show up in var_dump and array cast, foreach, as usual. And we have explicit accessors, which are really virtual properties, so they don't have any storage themselves. Any storage to use, you have to manage separately somehow. So, these don't show up in var_dump, foreach, and so on. Both these actual computed properties, they don't show up because that would require us to actually call all the accessors if you do foreach and that seems rather dubious to me.

Derick Rethans 16:44

How this will work for internal API's that some extensions use to access, like a list of all the properties, for say, for a debugger.

Nikita Popov 16:51

It'll work the same way as var_dump. I mean, in the end it's all, well it's not quite based on the same API's, but still, the answer is the same. You only get those properties that have some kind of backing storage, and those are only the ones that are either normal properties, or the ones with implicit accessors.

Derick Rethans 17:09

That means I need to go find out a way how to be able to read the ones with explicit accessors.

Nikita Popov 17:14

Yeah, if you want to. I don't think that the debugger should read those by default, because that means that doing a dump, will have side effects, which is not ideal, but maybe you want to have an option to show them.

Derick Rethans 17:26

That's something for me to think about, because I'm pretty sure people are going to want to see the contents of these properties, even in a debugger, even though that could mean that are side effects, which I'm not keen on.

Nikita Popov 17:36

I guess that's one of the, I would say advantages of using this over just magic get/set, because actually know which properties you're supposed to look at, with for magic get/set you just don't know at all.

Derick Rethans 17:51

The RFC talks a little bit about the performance impacts and although I saw the numbers I didn't actually read them, when preparing for this recording. What are the performance impacts for implicit accessors as well as explicit ones?

Nikita Popov 18:02

Impact is basically if you use implicit accessors that has similar performance to plain properties, performance is a bit worse. The reason is essentially that we have some limitations on caching. So we can't just cache it as if it were a normal property, because it could have asymmetric visibility. And we reuse the same cache slots for reads and writes. I've been thinking about maybe splitting that up but at least for now there is a small additional performance impact of using implicit accessors, but it's not really significant. On the other side if you use explicit accessors. Those are expensive, they are not quite as expensive as using magic get/set, but they are more expensive than using normal method calls. Reason is basically their normal method calls, they are very optimized, and they do not have to re enter the virtual machine, so we just stay in the same virtual machine loop, and we just switch to different stack frames. For magic get/set we actually have to like recursively call the virtual machine, because we don't have a good point to re enter it, at least based on our current API's. And we also have to deal with some additional stuff, particularly the fact that magic get/set and property accessor as both, they have recursion guards. Normally if you recurse methods in PHP, we don't do any checks about that. Xdebug does, but PHP itself doesn't, so you can infinitely recurse and PHP is fine. The only thing that happens is that at some point you'll run out of memory.

Derick Rethans 19:37

Or when extensions are loaded such as Xdebug, you'll actually still get a stack overflow.

Nikita Popov 19:41

So that's something we should still be addressing, at least the baseline behaviour that you can get to that memory limit error. For properties will set have recursion guards, which say that if you recursively access a property in magic get/set, that it will not call magic get/set again and instead, access the property as if they didn't exist.

Derick Rethans 20:01

Instead of throwing in an error?

Nikita Popov 20:03

Yeah. For property accessors I'm actually throwing an error on recursion, and the reason for that is if we didn't throw an error, then this would end up accessing dynamic property of the same name as the accessor, which would technically work, but it's very likely not what the programmer actually intended. So it's going to be really inefficient because you actually have to allocate space for the dynamic properties and access for those. So if you wanted to have some kind of backing storage for the property, then you should just explicitly declare it and access that, rather than accessing something with the same name and implicitly creating a dynamic property.

Derick Rethans 20:41

Yeah, that sounds all very complicated.

Nikita Popov 20:44

It's cleaner to just make it an error and let the programmer fix it, instead of PHP try to fix it for you.

Derick Rethans 20:51

Are there any BC considerations about the introduction of property accessors?

Nikita Popov 20:55

Not strictly, but I'm sure that it's going to break, various assumptions for people, or at least in the sense that, right now, most assumptions should already be broken through magic get/set. I mean you can always have this kind of magic behaviour. If we have accessors this is probably going to be a lot more common, and people will have to deal with things like properties being publicly readable, but privately writable much as because someone very rarely manually implements that behaviour, but because the language though has native support for it and it's going to be common.

Derick Rethans 21:28

We spoke a little bit about all the different sticking points in his RFC, for example with references, but there's one other thing and I think it's an argument you make somewhere on the bottom of the RFC, that there is a separation between implicit and explicit property accessors. I'm wondering whether it would make sense to consider adding whether the implicit part of this RFC first and then maybe later look at adding explicit property accessors.

Nikita Popov 21:54

That's really the main sticking point, and also my own problem with the RFC. I mean, you mentioned at the very start, that this is a very long RFC and still a little bit incomplete so it's going to be longer. It's a fairly complex feature that has complex interactions with other features in PHP. The implementation is actually, maybe less complex, then you think, given the RFC length. The main concern I have is that, at least for me personally the most useful part of the RFC, are the read only properties. The read only properties and the like Public Read, Private Write properties. I think these two cover like 90% of the use cases, especially because if you have a property that is only publicly readable, then you don't really have to be concerned about this case where you have to, later on, add additional validation. I mean after all the property is read only, or you control all the sets because they're private. There is no danger of introducing an API break, because you have to add additional validation. I think like the largest part of the use case of the whole accessors proposal will be covered by these two things, Or maybe even just one of these two things, that's a bit of a philosophical question. There are some people who think we should have just public read / private write and no proper read only properties, because that like looks the same from the user perspective, but still gives you more flexibility. I think that's like the most important use case, and we could implement that part with a lot less language complexity. So the question is really does it make sense to have this full accessor proposal, if we could get the most useful part as a separate simpler feature, and, well, I heard differing opinions on that one. I was actually pretty surprised that their reception of the on like a full accessors proposal was fairly positive. I kind of expected more pushback, especially as, this is the second proposal on the topic, we had earlier one with, like, similar syntax even though different details, and that one did fail.

Derick Rethans 24:02

How long ago was that?

Nikita Popov 24:03

Oh that was quite a while actually, at least more than five years.

Derick Rethans 24:06

I think that the mindset of developers has changed in the last five to 10 years, like introducing this 10 years ago would never happened, or even typed properties, right. It would never have happened.

Nikita Popov 24:17

That's true.

Derick Rethans 24:19

Do you have any idea when you're going to put us up for a vote? Because, of course, PHP 8.1 feature freezes coming up in not too far away from now.

Nikita Popov 24:28

Yeah, I'm not sure about that. I'm still considering if I want to explore the simpler alternatives, first. There was already a proposal. Another rejected proposal for Read Only properties, probably was called write once properties at the time. But yeah, I kind of do think that it might make sense to try something like that, again, before going to the full accessors proposal or instead.

Derick Rethans 24:54

Do you have anything else to add?

Nikita Popov 24:56

What are your thoughts on this proposal, and the question at the end?

Derick Rethans 24:59

I quite like it, but I also think that it might make sense to split it up. I'm always quite a fan of splitting things up in smaller bits, if that's possible too, and still provide quite a lot of use out of it. And that's why I was asking whether it makes sense to split it up into the implicit part and the explicit part of it, especially because it makes the implementation and the logic around it quite a bit easier for people to understand as well.

Nikita Popov 25:24

It's maybe worth mentioning that Swift also has a similar accessor model but it is more like a composition of various different features like read only properties, properties with asymmetric visibility, and then finally properties with like fully controlled, get and set behaviour rather than this C# model where everything is modelled using accessors with appropriate modifiers. So there is certainly precedent in other languages of separating these things.

Derick Rethans 25:55

Something to ponder about, and I'm sure we'll get to a conclusion at some point. Hopefully some of it before PHP eight one goes and feature freeze, of course. We've been chatting for quite a while now, I think we should call it the end for this RFC. Thank you for taking the time today to talk about property accessors.

Nikita Popov 26:11

Thanks for having me, Derick.

Derick Rethans 26:12

Thank you for listening to this installment of PHP internals news, a podcast, dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time.


PHP Internals News: Episode 85: Add IntlDatePatternGenerator

PHP Internals News: Episode 85: Add IntlDatePatternGenerator

In this episode of "PHP Internals News" I discuss the Add IntlDatePatternGenerator RFC with Mel Dafert (GitHub).

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:14

Hi I'm Derick, welcome to PHP internals news, the podcast, dedicated to explain the latest developments in the PHP language. This is episode 85. Today I'm talking with Mel Dafert about the "Add Intl Date Pattern Generator RFC" that she's proposing for inclusion into PHP 8.1. Mel would you please introduce yourself?

Mel Dafert 0:35

Hello, I am Mel. I've been working professionally with PHP for about three years. Recently I started reading the internals mailing list in my free time, but this is my first time contributing.

Derick Rethans 0:46

What made you think starting to read the PHP internals mailing list?

Mel Dafert 0:50

I generally like reading mailing lists and issue trackers. And since I work with PHP, it was interesting to read what's, what's happening.

Derick Rethans 1:02

That's what I'm trying to read this podcast as well of course; explaining what happens in the PHP development. But let's get to your RFC. What is the problem that you're trying to solve for this?

Mel Dafert 1:14

Currently, PHP exposes the ability for locale dependent date formatting with the Intl Date Formatter class. It is basically only three options for the format: long, medium and short. These options are not flexible enough in some cases, however. For example, the most common German format is day dot numerical month, dot long version of the year. However, neither the medium nor the short version provide this, and they use either the long version of the month, or a short version of the year, neither of which were acceptable in my situation.

Derick Rethans 1:47

I realize that you basically ran into a problem that PHP wasn't doing something you wanted to do it. But what made you actually wanting to contribute this?

Mel Dafert 1:57

I ran into this exact problem at work where I wanted to format dates in this specific way. After some research, I found out that ICU, the library that powers Intl Date Formatter, exposes exactly this functionality already. It would be relatively easy to wire this up into PHP and expose it there as well. I also found in a bug report that other people had this problem as well, so I decided to try my best at hacking at the PHP source and make it available to everyone, using PHP.

Derick Rethans 2:25

Had you ever seen a PHP source code before?

Mel Dafert 2:28

I don't think so. No.

Derick Rethans 2:29

But you are familiar with C a little bit?

Mel Dafert 2:32

On a very basic level, yes.

Derick Rethans 2:34

As part of this RFC What are you trying to suggest to add to PHP?

Mel Dafert 2:39

ICU exposes a class called date time pattern generator, which you can pass a locale and so called skeleton and it generates the correct formatting pattern for you. Skeleton just includes which part are supposed to include it, to be included in the pattern, for example the numerical date, numerical month, and the long year, and this will generate exactly the pattern I wanted earlier. It is also a lot more flexible, for example the skeleton can also just consist of the month and the year, which was also not possible so far. I am proposing to add a Intl Date Pattern Generator class to PHP, which can be constructed for locale, and exposes the get best pattern method that generates a pattern from a skeleton for that locale.

Derick Rethans 3:22

The skeletons, what do you specify in these skeletons?

Mel Dafert 3:27

It's a similar format to the pattern itself. For example, it's lowercase y lowercase y uppercase M uppercase M, would give you only the year and only the month, if I'm correct, that's exactly what the skeleton looks like.

Derick Rethans 3:43

But it puts it in the right order?

Mel Dafert 3:45

It puts it in in the right order, and in some cases also adds extra characters, or even changes the format slightly, depending on the locale.

Derick Rethans 3:55

So it is a bit of a flexible way to tell the Intl extension to format them in a slightly more, well how do you say this, a slightly more intelligent way than what the standard, long, short and medium constants do for you.

Mel Dafert 4:11

Exactly.

Derick Rethans 4:12

Why is it so important that you get these formats, right, or rather I should say, how do these locales influence formats and why is this important?

Mel Dafert 4:21

There are conventions of how to format dates and times vary rather strongly between languages and country. In Austria, for example, nobody would expect to understand the US format of month slash day last year. I assume people in England may have the same issue.

Derick Rethans 4:38

I think everybody has that issue except for people in the US.

Mel Dafert 4:42

But that only shows the importance of using a format that people are used to and understand. Other languages like mainland Chinese even have the words for day and month included in the format, as far as I understand. I don't speak Chinese.

Derick Rethans 4:59

Neither do I, but a long time ago when I, when I added the date time support, not Intl, but PHP standard date time support, I also looked at locales that operating systems have. And even these locales, which is not something that Intl uses now, also encode these extra characters at least for Japanese, so that was interesting to see there as well.

Mel Dafert 5:22

There is a lot of sometimes somewhat unexpected formats.

Derick Rethans 5:27

And I think German sometimes once the add the in front, and sometimes behind and things like that. I know there's lots of little intricacies, yes. I see that he RFC makes an argument about which name to pick for the new class. Can you elaborate on the two different options that are?

Mel Dafert 5:44

Yes, this is certainly for us and what I would call bike shedding. ICU has something of an inconsistency in its naming. The formatting class is called date formatter. And the pattern generator class is called Date Time pattern generator.

Derick Rethans 6:00

So it has the extra word time in it?

Mel Dafert 6:03

Between some inconsistency with Intl Date Formatter, which already exists in PHP, and the Intl Date Time pattern generator, or if we make sure PHP is internally consistent and omit the time in all cases. So far consensus seems to lean towards the second option. This is also what the Hack people decided to use.

Derick Rethans 6:24

And I believe that's the one you are wanting to go with in this RFCs as well, right?

Mel Dafert 6:28

Exactly. So far, everybody voted slide, or like express themselves to slightly favour the version without time. So that's the one I'm going with.

Derick Rethans 6:40

Of course, as you mentioned, this is a fairly small change to it, but the RFC talks a bit about things to add in the future, because I believe you weren't suggesting to add all of these Intl functionality straightaway. What is this future scope?

Mel Dafert 6:55

ICU would also expose more methods around the skeletons, for example, turning a pattern back into its skeleton, or building a list of skeleton and then mapping to the patterns from scratch. That's what you would do in theory if you added your own special locale to this.

Derick Rethans 7:17

I'm not sure how to do that with PHP actually, but I think ICU allows you to build your own basically files with settings right?

Mel Dafert 7:25

Exactly. This is omitted all of this, for simplicity, and because they couldn't think of a use case for it, personally, at least. If someone does need them, they could easily be added. It would just be a bunch of extra methods on the, on the class.

Derick Rethans 7:43

I know that ICU has so much functionality that hasn't been exposed to PHP, because there's just so much of it right?

Mel Dafert 7:50

Extremely, yes. I did see that Hack decided to expose all of them, like all the methods that the class has, but I really don't see the use of having to document and test all of these methods when really only one is going to be used. So I've decided to just go for the one that I can actually see people using.

Derick Rethans 8:14

And it is always easy to get smaller parts added to PHP than big things, to begin with.

Mel Dafert 8:21

Exactly.

Derick Rethans 8:22

How has the reception been so far?

Mel Dafert 8:24

I haven't gotten feedback from too many people, but it seems positive so far. A few people that did give some feedback were constructive and seem to seem to like the idea of adding this.

Derick Rethans 8:36

I reckon outside of English speaking countries this is quite an important thing to actually support, especially as we just discussed, people are picky about how these things are formatted.

Mel Dafert 8:46

Very picky.

Derick Rethans 8:48

So the name that you're going for would be Intl Date Pattern Generator, would it also support patterns for the time itself?

Mel Dafert 8:55

Of course, just like Intl Date Format also support formatting time.

Derick Rethans 9:02

It would be strange if it didn't, to be honest.

Mel Dafert 9:04

Yeah.

Derick Rethans 9:05

When do you think you're going to put us up for a vote for inclusion to PHP 8.1?

Mel Dafert 9:10

I think I sent out the first email about two weeks ago for opening the discussion. So I was planning to send out the heads up, either today or tomorrow, and opening the vote after that.

Derick Rethans 9:23

Okay. To be fair, I think there is very little controversy in this one, so it would surprise me if it didn't pass.

Mel Dafert 9:30

That's reassuring. I am somewhat anxious about them.

Derick Rethans 9:33

It's not controversial, it is an, it is perhaps a niche thing but it is something that is useful, so I can't see people really be opposing to this. To be fair, I think it looks like just an omission from when the Intl extension was written in the first place.

Mel Dafert 9:48

That's true. It might have not been supported in ICU at that point.

Derick Rethans 9:54

That is a good point as well because I think the Intl extension came with PHP five three, or five four, which I think is now eight years ago or something like that.

Mel Dafert 10:04

I think, I think ICU might have not had it at the end. It's an old word, like it's an all supported versions of PHP.

Derick Rethans 10:13

That is good to know. Would you have anything else to add?

Mel Dafert 10:16

No, I think that's it.

Derick Rethans 10:17

Thank you for taking the time today to talk to me about your proposal to add the Intl date pattern generator to PHP 8.1

Mel Dafert 10:25

Of course. Thank you for having me.

Derick Rethans 10:29

Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool, you can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time.


PHP Internals News: Episode 84: Introducing the PHP 8.1 Release Managers

PHP Internals News: Episode 84: Introducing the PHP 8.1 Release Managers

In this episode of "PHP Internals News" I converse with Ben Ramsey (Website, Twitter, GitHub) and Patrick Allaert (GitHub, Twitter, StackOverflow, LinkedIn) about their new role as PHP 8.1 Release Managers, together with Joe Watkins.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:14

Hi, I'm Derick, welcome to PHP internals news, a podcast, dedicated to explaining the latest developments in the PHP language. This is episode 84. Today I'm talking with the recently elected PHP 8.1 RMs, Ben Ramsey and Patrick Allaert. Ben, would you please introduce yourself.

Ben Ramsey 0:34

Thanks Derick for having me on the show. Hi everyone, as Derick said I'm Ben Ramsey, you might know me from the Ramsey UUID composer package. I've been programming in PHP for about 20 years, and active in the PHP community for almost as long. I started out blogging, then writing for magazines and books, then speaking at conferences, and then contributing to open source projects. I've also organized a couple of PHP user groups over the years, and I've contributed to PHP source and Docs and a few small ways over the years, but my first contributions to the project were actually to the PHP GTK project.

Derick Rethans 1:14

Oh, that's a blast from the past. You know what, I actually still run daily a PHP GTK application.

Ben Ramsey 1:21

Oh, that's interesting. What does it do?

Derick Rethans 1:23

It's Twitter client.

Ben Ramsey 1:24

Did you write it.

Derick Rethans 1:26

I did write it. Basically I use it to have a local copy of all my tweets and everything that I've received as well, which can be really handy sometimes to figuring out, because I can easily search over it with SQL it's kind of handy to do.

Ben Ramsey 1:41

It's really cool.

Derick Rethans 1:42

Yep, it's, it's still runs PHP 5.2 maybe, I don't know, five three because it's haven't really been updated since then.

Ben Ramsey 1:49

Every now and then there will be some effort to try to revive it and get it updated for PHP seven and eight, but I don't know where that goes.

Derick Rethans 1:59

I don't know where that's gone either. In this case, for PHP eight home there are three RM, there's Joe Watkins who has done it before, Ben, you've just introduced yourself, but we also have Patrick Allaert, Patrick, could you also please introduce yourself.

Patrick Allaert 2:13

Hi Derick, thank you for the invitation for the podcast, my name is Patrick Allaert. I am a Belgian freelancer, living in Brussels, and I spent half of my professional time as a IT architect and/or a PHP developer, and the other half, I am maintaining the PHP extension of Blackfire, a performance monitoring solution, initiated by Fabien Potencier.

Derick Rethans 2:39

I didn't actually know you were working on that.

Patrick Allaert 2:40

I'm not talking much about it but more and more. So I succeeded to Julian Pauli, who by the way was also released manager before so now I'm working with Blackfire people. It's really great, and this gives me the opportunity to spend about the same amount of time developing in C and in PHP. This is really great because at least I don't. It's not just only doing C. I, at least I connect with what you can do with PHP. I see the evolution from both sides. And this is really great. It's great, it's also thanks to you Derick, you granted me access to PHP source codes. That was to contribute to testfest something like 12/13 years ago, it was, CVS, at that time.

Derick Rethans 3:28

CVS, so now I remember that. Basically, what you both of you're doing is making me feel really old and I'm not sure what I like that or not. I think we all have gotten less head on our heads and greyer in our beards. In any case, what made you volunteer for being the PHP 8.1 RM?

Patrick Allaert 3:46

In my case, I think there were two two reasons is that PHP really brings a lot to me in my career, everything is built around my expertise in PHP and its ecosystem. By volunteering as a release manager. I think I can give something back to PHP, because the last time I contributed to source code of PHP, it was really years ago. If I remember it was array to string conversion that was very silent and not emitting any notice; now it's warning. In the meantime, so I think that was PHP 5.0,

Derick Rethans 4:22

Ages ago.

Patrick Allaert 4:23

Ages ago. Indeed. I was quite passive I was mostly reading on PHP internals, and most of the time now that is quite big so if, if I had to say something I could always see some someone who already just said the same thing so I was not saying: plus one. This is one of the reason and the second one I think is that I think it's kind of a unique opportunity, and I can learn a couple of things. I think, on day one when the Rasmus gave me the access, saying that I can do to OAuth authentication on SSH and that was: okay, day one I already learned something, so that was really cool.

Derick Rethans 4:58

And you Ben, I think you tried to be the PHP eight zero release manager as well at some point. That didn't happen at the time, but you've tried again.

Ben Ramsey 5:06

I almost didn't try again. I don't know why but when Sara announced it this year, I thought about it, and I don't know, I tossed it around a little bit, but I've been wanting to do it for a long time and I've noticed as Joe Watkins recently put it on a blog post that we need to help the internals avoid buses. So since this is a programming language that I've spent a lot of time with just as Patrick mentioned, both in and out of my day jobs. I want it to stick around to thrive. Since I'm not a C guru, but I do have a lot of experience managing open source software. I wanted to volunteer as a release manager, and I hope that I can use this as an opportunity to inspire others who might want to get involved, but don't know how.

Derick Rethans 5:55

And of course you just mentioned Joe, Joe Watkins, who is the third PHP release manager for 8.1, and that is a bit of a new thing because in the past, when the past many releases I can remember you've only had two most of the time.

Ben Ramsey 6:09

I think, on the mailing list that came up early on in the thread, and there was a general consensus, I think, consensus may be the wrong word, but there were a couple of people who spoke up and said that they wouldn't mind seeing multiple rookies or mentees or whatever you want to call us, and Joe when he volunteered to be the veteran, and he was the only one who volunteered as the veteran. He said that he would take on two. And so that's that's why Patrick and I are both here and I think that's a good idea, because it will continue to help, you know, us to avoid buses.

Derick Rethans 6:46

Yep. And if you're three, you only have once every 12 weeks. Whereas of course, in my case doing it for PHP 7.4 it's every four weeks, because it's me on my own, isn't it. Which is unfortunate that these things happen because people get busy in life sometimes. Getting started being a PHP release manager can be a bit tricky sometimes because just before we started recording, I had to add you to a few mailing lists. Do you think you've now have access to everything, or what do you need access to to begin with?

Patrick Allaert 7:18

There is the documentation about release managers, what are you supposed to do, and, and there is an effort of documentation, what you have to ask, in terms of access, and that's great. We are probably going to contribute with our findings to, to improve the documentation. Once you did a bit of the setup, mainly needs to access the servers. You should also know what is the workflow and what are the usual tasks. This is mentioned in the documentation, but I think it would be better to have a live discussion with someone that already did it. The fact that we are doing it with Joe Watkins, who is not only a release manager of 8.1, but also previous release manager, that should be really smooth, to, to see what the the orders and what is the routine to do. To do so, why do you think Ben?

Ben Ramsey 8:16

I agree. I think that, I mean we've only just gotten started. It's only this I believe is what was it two weeks ago that we, that this was announced. So this is the first time that Patrick and I have actually spoken face to face. Hi, Patrick! We've communicated by email and slack. I'm sorry not Slack, StackOverflow chat. Joe has given us a lot of good pointers. I feel like some of the advice he's given his been really good, but it's like Patrick said, we haven't really had like a live, like one on one chat, or face to face chat, where we could kind of get caught up on things and understand what the flow looks like. So last week I started going through a lot of the pull requests on GitHub. And I've been tagging them as bug fixes or are enhancements, and there's also an 8.1 milestone that I've been adding to a lot of the tickets, are the pull requests, and I've merged a few of them, but I think that I've merged them a little prematurely. So there were some funny things that came up out of that. I do plan to blog on this, but one of Nikita's comments in the Stack Overflow chat was, you've just made it your personal responsibility to add tests for uncovered parts of the Ristretto255 API.

Derick Rethans 9:40

Right, exactly say because I'm doing release management for PHP seven four. I don't do any merging at all. The only thing I'm doing is making the packages, and then coordinating around them. I'm not even sure whether it is a responsibility of a release manager to do.

Ben Ramsey 9:55

It may not be a responsibility. I felt like it was helpful maybe to go ahead and take a look and see where things were trying to follow up with people, to get them to respond if something had been sitting there for two weeks or so without any kind of movement. I would, you know, leave a message saying what's the status of this.

Derick Rethans 10:19

I know from the documentation that we have on our Release Management process. And many of these steps actually been replaced by a Docker container that actually builds the binaries, so I'm not sure whether Joe I've mentioned that to you yet, because I'm not sure whether that was around when he did release management, the previous time.

Ben Ramsey 10:36

Right, it wasn't around either when he did release management, but he's also mentioned that he would like for us to learn how to do it without the Docker container, even if we do plan to use the Docker container.

Derick Rethans 10:48

That's fair enough, I suppose. I have never had to do that, but that there you go. Now, what is the timeline like?

Patrick Allaert 10:56

In terms of timeline I think the very first thing is being all three release managers having live discussion to define what, what we should do, when we should do, and how. This way we clearly knows our responsibility and the sequence, and also how we are going to organize. Do we do every three releases? We share the task? How are we going to do the work together. In terms of timeline I think the very first release is going to happen in June, if I remember correctly. I set up an agenda sheet with ICAL so that we all can put that in our calendar, nothing really clear on my side.

Derick Rethans 11:41

From what I can see from the to do list that the first alpha release is June, 10, which is exactly a month away from when we are recording this.

Patrick Allaert 11:51

Right, yeah, it's one month come down before the very first one. I think it might be great that the very first release being made by by Joe, so that we can really see every single step he's doing, so that we can do the same. However, I guess it's kind of a shared responsibility to do triage of bugs and pull requests.

Ben Ramsey 12:14

Right. I think there is some desire among the community to see these releases in real time at least a few of them. So I'm going to try to encourage us to stream some of them maybe live, or at least record it and put it up somewhere for people to kind of just see the process to demystify it, so to speak.

Derick Rethans 12:35

I actually tried it a few months ago to record it, but there were so many breaks and pauses and me messing things up, and me swearing at it, that I had to throw away the recording. I mean the release went out just fine but like absolute as again... I can imagine the first few times, you're trying this there might be some swearing involved, even though you might not vocalize that swearing.

Ben Ramsey 12:56

Oh I'll vocalize it.

Derick Rethans 12:58

Fair enough. This is something that is that you're going to have to do for the next three and a half years. Do you think you'll be able to have the time for it in another three years?

Ben Ramsey 13:08

I mean for myself I I'm committed to it, I definitely believe that I'll have the time over the next three and a half years, and I'll make the time for it.

Derick Rethans 13:18

What about you, Patrick?

Patrick Allaert 13:20

Exactly the same. I think it's the least that I can do to PHP, in terms of contributing back, there will be some changes because I it's not like it's, it's not like the infrastructure is something that doesn't change, like for example recently, GitHub, being more having more focus rather than our Git infrastructure. So the changes that will happen, we will have to adapt, I have the impression that release manager has to, every time it's adapting to change this, and that will be very interesting.

Derick Rethans 13:53

Luckily we haven't had too many. The only thing I had to change with a change from git dot php.net to get up, was my local remote URLs. So there wasn't actually a lot to do, except for running git remote set-url. I was pleasantly surprised by this because if anything messing around with Git isn't my favourite thing to do.

Ben Ramsey 14:14

Also, merging is a little bit more streamlined now you don't have to go to qa.php.net to do that.

Derick Rethans 14:21

I've never done anything without

Ben Ramsey 14:23

Really? Oh, I guess you would commit directly to git.php.net?

Derick Rethans 14:27

Yep.

Ben Ramsey 14:28

If there were PRs on GitHub, the only way to merge them well, probably wasn't the only way but one way to merge them was to go to qa.php.net, and if you were signed in with your PHP account, you were able to see all the pull requests, and choose to merge them.

Derick Rethans 14:46

Yep, also something I've never done as an RM. The only way how I have reacted with pull requests is commenting on the pull requests, and I wouldn't merge them myself. With the only exception of security releases where you need to cherry pick from certain branches into your release branches. I'm not always quite sure about it as the responsibility for release managers actually do the merging into the main branches. From what I've understood is it's always the people that made the contributions, who just merge themselves, and you then sometimes need to make sure that they merge into the right branch instead of just master, which is what, as far as I know, the, the buttons on GitHub do.

Ben Ramsey 15:21

Well the individual contributors, in this case, if they're doing like a bug fix or something, most of them, or many of them aren't don't have permission to do the merging, so someone else has to merge it, like, often I see Nikita merging, a lot of the pull requests.

Derick Rethans 15:37

Maybe I've just been relying on Nikita to do that then. I'm not sure how, bug fixes are merchants debug fig branches. I think it's usually been done by people that have access already anyway, because it's often either Nikita or Cristoph Becker, or Stas, and the main developments, or the main other new things that people don't have access to are usually to master. So I guess there's a bit of a difference now. I'm not sure what if any other questions, actually, would have anything to add yourself?

Patrick Allaert 16:05

maybe something that would be quite challenging is the very recent discussion about the system that we, that we might change from. The system or the issue tracker with where we have all the bugs. I understand the current issues, I understand as well the drawbacks of what is possibly, for example GitHub issues. It might be great for some, would it be great for us? If we do it was going to be in the bring a lot of changes, and I think, 8.1 will be already slightly impacted by the change to GitHub in terms of pull request strategies, but potentially there will be another change, which is around the bugtracking system.

Derick Rethans 16:54

I have strong opinions about this, but we'll leave that for some other time. What about you, Ben?

Ben Ramsey 17:00

Right, I actually don't think that we're going to end up making a lot of changes in that regard, very, like, not in the near term, probably. But I did want to point out, or promote that I've started journaling some of these experiences, and capturing information mainly for my own purposes, but I'll be posting these publicly so that others can follow along. My blog is currently down right now.

Derick Rethans 17:28

That's because you're using Ruby isn't it?

Ben Ramsey 17:30

That's because I'm using Ruby. The short story of it is that there are some gems that were removed from the master gem repository at some point in the past, or the versions I'm using were removed, either for security reasons or what I have no idea why. And that's put, put it into a state where I just can't easily update. I just haven't, I just don't care, right now, so I plan on migrating to something else. In the short term, I'm not going to be doing that. So I've started writing at https://dev.to/ramsay and Dev.to is just a developer community website. If you're on Twitter. It's run by @thePracticalDev, I'll be, I'll be blogging there.

Derick Rethans 18:18

And I'll make sure to add a link to that in the show notes as well. Thank you for taking the time this afternoon, or morning, to talk to me about being a PHP 8.1 release managers.

Ben Ramsey 18:28

Thank you for having me on the show.

Patrick Allaert 18:30

Thank you, Derick for that podcast. I'm really glad you invited us.

Derick Rethans 18:39

Thank you for listening to this installment of PHP internals news, a podcast, dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time.


PHP Internals News: Episode 83: Deprecate implicit non-integer-compatible float to int conversions

PHP Internals News: Episode 83: Deprecate implicit non-integer-compatible float to int conversions

In this episode of "PHP Internals News" I talk with George Peter Banyard (Website, Twitter, GitHub, GitLab) about the "Deprecate implicit non-integer-compatible float to int conversions" RFC that he has proposed.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:14

Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is episode 83. Today I'm talking with George Peter Banyard, about another tidying up RFC, George, would you please introduce yourself?

George Peter Banyard 0:31

Hello, my name is George and I work on PHP in my free time.

Derick Rethans 0:35

Excellent. I was just talking to Larry Garfield, and he was wondering whether you or himself, are the second often guests on this podcast, but I haven't run a stats. But it's good to have you on again. Following on for from other numeric RFCs, so to speak. This one is titled deprecate implicit non integer compatible floats to int conversions. That is a lovely small title you have come up with.

George Peter Banyard 1:01

Yeah, not the best title.

Derick Rethans 1:03

What is the problem that this RFC is trying to solve, or rather, what's the change that is in this RFC is trying to solve?

George Peter Banyard 1:11

Currently in PHP, which is a dynamic language, types are not known at the statically at compile time, so it's so everything's mostly runtime. And most type conversions are relatively sane now in PHP 8, because like numeric strings have been kind of fixed. But one last standing issue is that floats will pass an integer check, without any notices or warnings. Although floats, don't usually fit in integer will have like extra data which can't be represented as an integer. For example, they can have a fractional part, or they can be infinity, or not a number if you divide, like infinity by infinity, or 0 over 0 or other things like that.

Derick Rethans 1:55

These are specific features of floating point numbers on computers?

George Peter Banyard 1:59

Yes.

Derick Rethans 2:00

Is there any prior work that is RFC is building on top of

George Peter Banyard 2:03

It builds up on top on the saner numeric string RFC, because it tries to like make the whole numericness of PHP, as a concept better and like less error prone, but in essence it's mostly self contained. If you use a floating point number, were you should be using an integer. If the floating point number, is considered an integer because it only has like decimal zeros, and it fits in the integer range, then you'll have like no error. So if you use 15.0 as an array key, it gets converted to 15, you'll get don't get any error because it's like well it's just 15 like it doesn't mind. But if you do 15.5, then you'll get like a, like a deprecation notice which will tell you it's like, well, here's the key gets implicitly converted, you should be aware of this because if you use 15 somewhere else, you'd be overriding the value.

Derick Rethans 2:54

And that currently doesn't happen yet. And you say, fits in the integer range, what ranges are we talking about here?

George Peter Banyard 3:01

On 32 bit, which I would imagine most people don't use any more, is a very just like minus 2 billion to 2 billion, because PHP integers are signed, and on 64 bits, it's like nine quintillion?

Derick Rethans 3:15

It is a 64 bit range.

George Peter Banyard 3:17

You should be fine by not hitting them, but like if you do some maths computation with it, you might hit like the boundaries, or you do like very edge cases where you try to like mess up with PHP, like that so it's something which you can do it.

Derick Rethans 3:30

From what I understand floating point numbers they store, integer things as well as fractional things, and from what I remember is that the range that floating point numbers can store things in without losing any precision is something like 53 bits I think. So if it's larger than 53 bits, then it would have to store something in its floating point part of it, and hence starts losing numbers.

George Peter Banyard 3:56

Yeah, I'm not the expert on floating point numbers.

Derick Rethans 4:00

They are tricky.

George Peter Banyard 4:01

They are tricky and the standard is very confusing at times, especially with like NaNs and like signalling NaNs and like, but the basic concept is like exponential numbers like exponential like scientific notation. You have like your base number, and then you have like a power, and then just gives you like a larger range, but with exponential scientific notation, you also lose like precision, because you don't really care about like the minute numbers.

Derick Rethans 4:24

This is why there is a conversion issue both ways right, a floating point number, without fraction or NaN or INF, will always fit in the 64 bit integer.

George Peter Banyard 4:32

Yeah.

Derick Rethans 4:33

But although you can represent a 64 bit integer in a floating point number. You can't do that without losing data if the integer takes up more than 53 bits, so the round trip conversion also has issues.

George Peter Banyard 4:44

That's why, like, that's the check. I'm using, because initially I was just doing an F mod check, because I was like oh just check for like fractional parts, and then Nikita was like: Well, you probably should do round trip checks. Because you also catch infinity, not a number, which also has like some interesting implications, that like if your floating string is considered infinity and you cast it to an integer, you get max int. If it's a floating point number, you'll get zero, which is an handy thing that needs to be like also dealt with, because I just discovered that while working on that. Trying to already get rid of like conversions, is I think a good first step on making most things sane. And we already do that with string offsets. So it's also just like making it more of a global aspect of the language.

Derick Rethans 5:35

So this RFC only talks about converting floats to int, but not int to float?

George Peter Banyard 5:40

Yeah, because mostly integer to float is a, is a safe conversion, because you can, it fits usually in a floating point number, except apparently 64 bits.

Derick Rethans 5:52

I think it is something that we actually should also look at and this is not something I'd realized because I originally thought that before reading the RFC, this that is what you were trying to get, but it's they other way, the other the way around here; so I can see another upcoming RFC to do the other side of the conversion as well.

George Peter Banyard 6:11

I imagine so. I put that on my to do list, which is already growing larger and larger with every small idea. I encountered in the language which I'm like, why on earth is PHP doing that?

Derick Rethans 6:22

But to get back to this RFC in which kind of situations can this trip up developers?

George Peter Banyard 6:27

I would expect most of the time it shouldn't, because this every time you use integers, floating points is mostly maths code, or, if you're doing something very weird, like storing money as a floating point number, which you shouldn't do, but people do it anyway.

Derick Rethans 6:45

Does PHP have an arbitrary precision type?

George Peter Banyard 6:48

No we don't. But you can use GMP.

Derick Rethans 6:52

I don't know what it stands for either, what GMP stands for. They also used to be BCMath, is that's still around as well?

George Peter Banyard 6:58

Yeah, BCMath is still around. Most of the time you don't need arbitrary precision, at least for traditional PHP code which is a web based and possibly like E commerce so you're not hitting like insane numbers, but it is mostly full of direct cases or also with like string conversions to like integers, that's I think like, my main point is to try make also string conversions to then numeric type like to make them safer. I think was the previous RFC was the saner numeric string, there's maybe an expectation that you can finally bypass the strict type mode, because everything is strings in HTTP land. So if you get a value, and you just wanted to let like the engine, take care of like making it, it's a valid number and it doesn't lose precision, and you get an integer. That makes it helpful that you get these warnings and notices that hopefully in PHP nine which is who knows how many years away, we finally can like lock down on these edge case behaviours.

Derick Rethans 7:57

The RFC is not making this stop working, but rather it will throw a deprecation notice?

George Peter Banyard 8:04

Yes. Currently, yes.

Derick Rethans 8:06

Why do you say currently?

George Peter Banyard 8:07

The plan is in PHP nine, to make this a type error, which initially, I wanted to make it a warning instead of a deprecation notice, but then people on list were, well, like a warning is too strong, and it doesn't imply anything. And if you want to change this to like a type error you should make it a deprecation notice because it means the behaviour will stop working in the, in the later version. So that's why I changed it to the deprecation notice in the second iteration of the RFC.

Derick Rethans 8:34

Because, I mean, she just said that could potentially impact already existing code. What kind of BC issues are ever this, by introducing this deprecation warning?

George Peter Banyard 8:43

There are various operators, that will implicitly convert floating or float strings to integers. So those are like bitwise operators, shift operators, the modulo operator, the assignment operators of the above. If you try to assign a float to an integer type property. If you try to pass a float to a parameter integer type, or as a return type. Those will show deprecation notices. And then only for floats, not float strings, is the bitwise NOT operator, because that one works with strings as well. And if you use a float string it will use the normal string semantics. And then, as an array key because floating strings already so I noticed was as an array key.

Derick Rethans 9:28

Do you think it is better to have a deprecation notice than in stead PHP silently truncating data?

George Peter Banyard 9:35

Yeah, if you want that behaviour of like implicitly truncating, you can always use an int, cast, which will do the job for you. Which makes the code explicit and tells the intention of the developer, instead of just like, oh I got a float here, pass it to an integer.

Derick Rethans 9:49

What's the reaction to this been so far?

George Peter Banyard 9:51

Not many reaction on list, but voters currently one weekend, and it's been unanimously approved, so I'm pretty happy that most people are for it.

Derick Rethans 10:02

It's always good to hear unanimous agreement, maybe I should switch my vote to No. As you have said the reaction has been fairly good. And obviously this RFC passed, so the reaction was good enough for this to pass. Do you think there will be some follow up RFCs for ironing out more things like this?

George Peter Banyard 10:20

Possibly, I don't know if I'll get them into PHP 8.1. Because time, and I've got some other projects. But I think, maybe, to see you, I've just learned that like some integers lose precision as in floating point numbers, which I wasn't aware of. What's maybe a bit more controversial is to change the behaviour of casting floats, which don't fit into an integer range to now produce Max int, or minimum Int, instead of zero. You will need to put like deprecation notices or warnings when you use an explicit cast, which I don't know how people will feel about that.

Derick Rethans 10:58

I see what you mean there. It will be an interesting discussion for when that happens I would say.

George Peter Banyard 11:04

Yep.

Derick Rethans 11:05

Would you have anything else out about is RFC itself?

George Peter Banyard 11:08

Not really it's mostly straightforward. All the details are in the RFC, all the BC breaks are in the RFC. If you're an extension maintainer, there's only one BC break with like a function. When you take Zval and you convert it to an integer, you'll get a notice, which I expect most extension maintainers want their users to know that this is going to like throw at the later point. But you can also then do it manually if you want to support this behaviour implicitly in your extension.

Derick Rethans 11:36

I think it is important that extension, that for extensions be if it doesn't suddenly change, but forcing an API change on them is often a better way than deciding to changing an existing API, I think.

George Peter Banyard 11:47

The problem is is the API I'm using is used all over the PHP source code, changing that everywhere, felt a bit like hassle, but I've added like a C function which is long compatible, so you can check in advance if it will also do stuff like that. And then there's also a version which, which doesn't serve any notices so you can do it anyway.

Derick Rethans 12:08

And that is a new function I suppose?

George Peter Banyard 12:10

Yes.

Derick Rethans 12:11

I think it's something that extension authors should look at in any case, I mean, we have this lovely upgrading dot internals file, where this certainly should fit in as well in that case, I suppose.

George Peter Banyard 12:22

Yeah, it'll fit in. It's currently not that big as a file that usually gets big, a bit before feature freeze because all the changes, land then.

Derick Rethans 12:30

I know how this goes. This is also exactly the next debug starts breaking again because of API changes. So far I have been lucky there, so there's not been too many in PHP eight one. Do you know actually how much time there is until feature freeze?

George Peter Banyard 12:45

I would imagine it's end of July, as usual, that's the usual timeline. I don't know because RM selection hasn't happened yet, so I don't know how long that usually takes.

Derick Rethans 12:54

You're talking about RM for release manager selection here. Once this happens all hope to talk to the new release managers as well, and get them to introduce themselves here.

George Peter Banyard 13:02

Seems like a good idea.

Derick Rethans 13:03

To chats about any favourite things for PHP eight one. All right, George, thank you for taking the time this afternoon to talk to me about another tweak to PHP's handling of numbers in general, and I'm sure it won't be the last one.

George Peter Banyard 13:19

Thanks for having me, and I'll talk to you soon.

Derick Rethans 13:22

Hopefully in a pub with a pint.

George Peter Banyard 13:24

Yeah, that would be nice.

Derick Rethans 13:27

Thank you for listening to this instalment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool, you can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time.


PHP Internals News: Episode 82: Auto-Capturing Multi-Statement Closures

PHP Internals News: Episode 82: Auto-Capturing Multi-Statement Closures

In this episode of "PHP Internals News" I chat with Larry Garfield (Twitter) and Nuno Maduro (Twitter, GitHub, Blog) about the "Auto-Capturing Multi-Statement Closures" RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:14

Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is episode 82. Today I'm talking with Nuno Maduro and Larry Garfield. Nuno, would you please introduce yourself?

Nuno Maduro 0:30

Hi PHP developers. My name is Nuno Maduro, and I am software engineer at Laravel, the company that owns the Laravel framework, and I have created multiple open source projects for the PHP community, such as Pest PHP, Laravel zero, collusion and more.

Derick Rethans 0:48

Alright, and Larry, could you please follow up on that.

Larry Garfield 0:51

Hello world, so I'm Larry Garfield. You may know me from several past podcasts here, various work in the PHP fig, and all around gadfly and nudge of the PHP community.

Derick Rethans 1:03

Good to have you again Larry and good to have you here today, Nuno. The RFC, that we're talking about here today is to do with closures, and the title of the RFC is auto capturing multi statement closures, which is quite a mouthful. Can one of you explain what this RFC is about?

Nuno Maduro 1:20

As you said, the RFC title is indeed auto capturing multi statement closures. But to make it simple, we are really talking about adding multi line support to the one line arrow functions that got introduced it, in PHP 7.4. Now, this new multi line arrow functions have exactly the same features as the one line arrow functions, so they are anonymous, locally available functions; variables are auto captured lexically meaning that you don't actually need the use keyword to manually import the use of variables, they just get auto captured from the outer scope. And the only difference really is one line arrow functions have a body with a single expression. This RFC actually allows you to use a full statement list that possibly ends with a return.

Derick Rethans 2:18

Excellent, what the syntax that you're proposing here?

Nuno Maduro 2:22

Well, as you may know, one line arrow functions have the syntax, which is fn, parameter list, and then that arrow expression thing, and this new RFC proposes that, optionally, developers can pass a curly brackets with statements, instead of having that arrow expression syntax. Now, this curly brackets with statements, simply denotes a statement list that potentially ends with a return. Concerning the Auto Capture syntax, we will be just reusing the Auto Capture syntax and feature that already exists on one line arrow functions, meaning that you don't need the use keyword to manually import variables. And of course, the syntax itself, in the in the feature, works the exactly same way. Concerning the syntax, it's also important to mention that this RFC was done in combination with the short functions RFC from Larry, but I think I'm going to let Larry speak about that later on this episode.

Derick Rethans 3:26

What's the main idea behind wanting to introduce this auto capture multi statement closures. Because from what I understand the arrow is now gone, and it's been replaced by just a function body within curly braces. But why would you want to extend a single expression to like multiple statements?

Nuno Maduro 3:44

Well, we all know that long closures in PHP can be quite verbose, even when you want to perform a simple operation. And this is largely due to the syntax syntactic boilerplate that you need to use when using long closures to manually import all the variables with the use keyword. Now, while one line arrow functions solve this problem to some extent, there are also a few cases that you might want to leverage the simplicity of auto capturing, but using two or three lines in a statement list. And one example I can think of is that when you are within a class method with multiple arguments on it, you might want to just return a closure, using all the arguments of that method, and actually using the use keyword in least all of the arguments is in this case, redundant and even pointless. It is also some use cases for example with array filter or similar functions, where they use keyword just adds some visual complexity to the code. We believe that the massive majority of PHP developers are really going to appreciate this RFC, as it makes just the code more simpler and shorter. The community loved these changes really, as proven on property promotions, one line arrow functions, or even the null safe operator.

Derick Rethans 5:10

And I think this is something that the PHP language itself is moving forwards to, right. I see in the RFC that you're, you're trying to make sure that syntax stays consistent with itself and you use a word called sigils here for some of these things. What were the important parts of making sure that the syntax stays the same?

Larry Garfield 5:30

So this actually relates to the short functions RFC. So the short functions RFC, as we discussed in previous episode, I'm trying to make it possible to write single line expression named functions in a more compact way. And that is a different problem with different syntax than auto capturing closures, which is why we needed to work together on these, because we want to make sure that the syntax for the various different kinds of functions that PHP supports, are all consistent with each other, and this piece of syntax indicates this thing consistently, rather than: in some cases this syntax means this, and other cases that means this other thing. Didn't want to end up with that kind of mess. After the short functions RFC, one part of the feedback was, there are people working on auto capturing long closures. Y'all should work together and make sure that the syntax plays nicely. And, for various reasons I just sat on it for a while, before finally getting hold of Nuno earlier this year, and we got together and talked and made sure that what he was doing and what I was doing, the syntax complemented each other. We ended up with in our discussion and analysis is that you can have a function that is named or anonymous. It can have Auto Capture or not. And it can either be a single expression with no return statement because it just evaluates to that value, or it can be a list of statements, one of which could be returned, potentially multiple could be returned. And right now we have syntax in PHP to support three possible combinations. But there's actually eight possible ways to combine those two, those three variants. We looked at all right, we have three of the eight, which of the others makes sense to have, and the two that makes sense to have are: short functions named function, no closure, and I say an expression body is what my RFC does. And then, anonymous Auto Capture statement list, which is what Nuno's RFC does. That rounds out that list, and the other combinations, I'm not sure actually have a purpose. Technically could exist and this also then means, if in the future we wanted to add those, then we know exactly what the syntax is going to be for those and what they would mean it's all just following that established pattern, so it makes it really easy for people to learn and understand. So what we end up with, out of these two RFCs together, which can stand alone, and they're going to be put to votes separately, but as I said complement each other. If you have: something, double arrow, expression, that consistently throughout the language ends up meaning evaluates to this expression. Short lambda means: you know, this function evaluates to this expression, a named function, double arrow expression, array key double arrow expression, a match statements, and so on and so on and so on, double arrow always means evaluates to expression. And the key word function means declaring a function named or anonymous, that does not auto capture anything. The case of a named function, there's nothing to capture; in the case of a closure, you'd have the manual capture with a use statements exactly like we've had since 5.3. And the FN keyword means, this is a function that is going to Auto Capture. But oh fn statement list with curly braces. I know that means: Auto Capture, statement list, or keyword function with an arrow: I know that means not auto capturing expression, and all these combinations then just make sense and they're easy to learn and they fit together well. That's really what we were trying to make sure, that between these two RFCs we ended up with that consistent set of rules around the syntax that was not designed that way originally but plays out in that way, and we can now make sure it stays playing out in that way that is internally consistent, and therefore easy to learn, easy to document and so on.

Derick Rethans 9:35

Because all the things that you just mentioned, they're already in place for existing syntax.

Larry Garfield 9:40

Correct. And so we're just taking existing syntax that means a thing, and combining those existing pieces of syntax in a new way. In some cases that syntax didn't necessarily mean that deliberately, for example, the FN keyword on short lambdas, was not added for the purpose of declaring Auto Capture. It was added because we needed to have some kind of syntax there to keep the lexer happy, fine, but now that it's there, we can say: all right, that is going to now mean auto capturing function, because that's something consistent with the language as it is and the language as it evolves into.

Derick Rethans 10:14

Is the intention of this new syntax to replace long closures?

Larry Garfield 10:19

Not entirely. I suspect, a great many use cases of long closures could get away with then using the auto captured version, but there's no plan to remove the long closure version. Part because there are cases you do need the manual closure, particularly, it's still the only way to capture a variable by reference. All the other versions are by value, which 99% of the time is what you want, but in that other 1% If you do need to by reference, then you've still got the long version.

Derick Rethans 10:48

The long version that uses the use keyword.

Larry Garfield 10:51

Right, and then you're manually capturing things, are cases where you would want to, you know, use the same variable name inside and out but not refer to the same variable, so in that case you can use the long version, and then you don't have that collision. In practice however, the only languages I know of, that have explicit capture on closures are PHP and C++. As far as I know, every other language, including the other major scripting languages, Auto Capture. We're really the oddball out, and in practice, I think using Auto Capture is going to be fine. It's going to be easier, and we're not going to introduce any substantial amount of new bugs around it. The place where that might cause issues, is if you have a long function with a long anonymous function in the middle of it with Auto Capture and you can't keep track your variables, in which case you're already doing it wrong anyway, you should have shorter functions and shorter closures. A real use case for this is: I have a function that's a closure that's two or three lines long, because not everything in PHP can be an expression, sometimes it has to be a statement. So okay, this thing is going to be two lines long instead of one line long, but I don't want to have to convert to the super verbose long version and manually declare all of my imports, I just want to add a second line. And so this makes that use case, a lot easier and a lot more convenient.

Derick Rethans 12:11

I remember when discussing the match operator, which I think I also spoke to you about?

Larry Garfield 12:17

Match is the one you spoke to yourself about.

Derick Rethans 12:19

That's what it was, yes.

Larry Garfield 12:20

It was a fun episode.

Derick Rethans 12:22

When I discussed the match operator with myself, I think I looked at whether it was possible to extend a match expression with having multiple statements on the right hand side as well, where it is currently: it's the arrow with a similar expression. Is this something that you'd be looking at to tie this auto caption closure style into as well?

Larry Garfield 12:42

It's a related issue that has been discussed mainly around match, around supporting multi line expressions. And that'll be some kind of syntax which hasn't been defined yet to list a series of statements, which can then be wrapped up together and have a final statement that is returned, and then the whole thing gets evaluated, and can be used in place of an expression like in a match statement or a function body. If we had a syntax for multi line expressions, that would be an alternative way to get to the same place this RFC gets to, because you could say: FN, parameters, double arrow, multi line expression, with whatever syntax that ends up being. And that gets you essentially the same thing at the end of the day. Is that good or bad, I'm kind of torn on it. What we point out in the RFC, is this syntax for auto capturing multi line closures, gives us a kind of roundabout way to put a multi line body into a match arm, where what you, the single expression, that the match arm evaluates to, is a multi line closure with Auto Capture that you then immediately self execute. The syntax for that is a little bit quirky. It looks kind of like older style JavaScript. We have parenthesis, function definition, closed parenthesis, open paren, close paren, so it just executes immediately. It's not ideal for that use case. Personally I think multi line match expressions, or multi line match arms, are a rare enough need that on the rare occasion you need it, this would be good enough. And if it's not good enough, you really should break that logic out to a separate function anyway and just call that. Not everyone agrees with that. So that's more a more of an interesting side effect of this RFC than a goal per se. One have to use it in that fashion I probably not will not use that in that fashion, very often, but it's, we now have a solution for that use case, if you actually have that use case, and we don't need any dedicated syntax for that.

Derick Rethans 14:46

That could be part of a future RFC if people still feel inclined, that they need that. Talking about things in the future, is there any future scope with this RFC as well?

Nuno Maduro 14:57

There is really nothing planned on this RFC is future scope. Yet, there is something that I will like to personally explore in the future, that now we have this fn keyword, that means Auto Capture, or access to the outer scope. And I think something would be very cool, is to explore named functions in the way that they are declared globally, with something like fn get name, and then that function would be able to access the outer scope, but again this is something that personally I would like to explore but it's not included in this RFC. I just plan to explore this in the future now that we have this possible combinations that Larry just explained.

Derick Rethans 15:43

It's always interesting to see what people think of when you post RFC to the mailing list. What sort of where the biggest arguments against introducing this new syntax?

Larry Garfield 15:54

It's interesting. For both of these RFCs both my short functions, and now the Auto Capture multi line, the feedback from the public community on Reddit, on Twitter and so forth has been extremely positive people love: Oh, I can write less syntax and get stuff. It's been not universally but overwhelmingly positive. The feedback on the mailing list has been decidedly mixed with some people saying, cool this you know I've been waiting for that, and others saying: Why? Been push back: If your capture statement is that complex you, you're doing it wrong anyway. Or, if you do have Auto Capture, rather than explicit, your odds of capturing something you don't intend to are higher. And so you can introduce weird bugs that way.

Derick Rethans 16:42

Which aren't really arguments against having a multi line out to capture closure, with two or three statements. I mean if you're putting 50 lines in there then sure, you can make that argument I guess.

Larry Garfield 16:53

Exactly, and that's kind of our response is: if you have a complex enough piece of code that Auto Capture becomes problematic. You have a complex enough piece of code that you really should be manually capturing, or just refactor your code so you don't have that much complexity. That since it kind of becomes a good indicator of need to refactor. Then there's always the argument of: why should you add more syntax for anything, you know, we've got one syntax let that rule everything and that's that comes up with every RFC. Points are valid, to an extent, but I think the convenience factor of being able to write code more naturally with less effort that does stuff that right now is just clunky, is a stronger argument, especially given that most other languages don't have manual capture and get along fine. People have mentioned JavaScript as an example where the Auto Capture used to be highly problematic, then resolved with an extra keywords you can declare a variable inside a closure with let, that is then locally scoped and overrides anything in a parent scope. I don't think PHP needs that in part because we don't use closures as overwhelmingly as JavaScript does, and honestly that problem has kind of gone away in JavaScript, as they've introduced real classes, and other more traditional techniques that obviate the need for those kind of closure inside closure inside closure inside the closure nonsense. Python doesn't actually have multi line lambdas as far as I'm aware, because they have named functions that are scoped local. Ruby, as far as I know just does Auto Capture and doesn't have any special syntax around it. So, I have not heard of them having any problems. As I said, C++ has manual capture and that's the only one I can think of that has it. I think looking at other languages, the problems people have pointed out are more hypothetical than real, and I'm hoping that, you know, voters on the list will see all right. This makes life easier and the problems with it are hypothetical, not real problems that we've seen in the wild. So let's just make life easier for people.

Derick Rethans 19:05

Is there any chance of this breaking BC, somehow?

Larry Garfield 19:08

It shouldn't, the syntax right now would be syntax error. I don't see any, any BC breaks possible.

Derick Rethans 19:16

That's always a good thing isn't it.

Larry Garfield 19:18

Yes.

Derick Rethans 19:19

You were talking about appealing to the voters on the mailing list, which have the right to vote on features usually. When do you think you will be putting this up for a vote?

Larry Garfield 19:29

Probably around the end of April. We can put probably both RFCs up for a vote, you know, let the chips fall where they may. As you said both RFCs are stand-alone. If one pass and the other fails everything still works. Obviously we both like both the pass.

Derick Rethans 19:43

And with both you mean: both this RFC, which is the output capturing multi statement closures RFC, as well as the short functions RFC that we spoke about in episode 69.

Larry Garfield 19:53

Correct.

Derick Rethans 19:54

Thank you for taking the time today to talk to me about the new RFC that you're proposing.

Larry Garfield 20:00

Thank you, Derick always good to talk.

Nuno Maduro 20:01

Yeah, thank you so much for having me.

Derick Rethans 20:06

Thank you for listening to this instalment of PHP internals news, a podcast, dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patron. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time.


PHP Internals News: Episode 81: noreturn type

PHP Internals News: Episode 81: noreturn type

In this episode of "PHP Internals News" I chat with Matthew Brown (Twitter) and Ondřej Mirtes (Twitter) about the "noreturn type" RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:15

Hi I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is episode 81. Today I'm talking with Matt Brown, the author of Psalm and Ondřej Mirtes, the author of PHPStan, about an RFC that I propose to alter the noreturn type. Matt, would you please introduce yourself?

Matthew Brown 0:37

Hi, I'm Matthew Brown, Matt, I live in New York, I'm from the UK. I work at a company called Vimeo, and I've been working with for the past six years on a static analysis tool called Psalm, which is my primary entry into the PHP world, and I, along with Ondřej authored this noreturn RFC.

Derick Rethans 1:01

Alright Ondřej, would you please introduce yourself too?

Ondřej Mirtes 1:04

Okay, I'm Ondřej Mirtes, and I'm from the Czech Republic, and I currently live in Prague or on the suburbs of Prague, and I've been developing software in PHP for about 15 years now. I've also been speaking at international conferences for the past five years before the world was still alright. In 2016, I released PHPStan, open source static analyser focused on finding bugs in PHP code basis. And somehow, I found a way to make a living doing that so now I'm full time open source developer, and also father to two little boys.

Derick Rethans 1:35

Glad to have you both here. We're talking about something that clearly is going to play together with static analysers. Hence, I found this quite interesting to see to have two competitive projects, or are the competitive, or are the cooperative.

Matthew Brown 1:56

I think half and half.

Derick Rethans 1:57

Half and half. Okay.

Ondřej Mirtes 1:59

Competition is a weird concept in open source where everything is released for free here that

Derick Rethans 2:04

That's certainly true, but you said you're making your living out of it now so maybe there was something going on that I'm not aware of. In any case, we should probably chat about the RFC itself. What's the reason why you're wanting to add to the noreturn type?

Ondřej Mirtes 2:18

I'm going to start with a little bit of a detour, because in recent PHP development, it has been a trend to add the abilities to express various types natively, in in the language syntax. These types, always originally appeared in PHP docs for documentation reasons, IDE auto completion, and later were also used, and were being related with static analysis tools. This trend of moving PHP doc types tonight this type started probably with PHP seven that added scalar type hint. PHP 7.1 added void, and nullable type hints, 7.2 added object type, 7.4 added typed properties. And finally, PHP, 8.0 added union types. Right now to PHP community, most likely waits for someone to implement the generics and intersection types, which are also widely adopted in PHP docs, but there's also a noreturn, a little bit more subtle concept, that would also benefit from being in the language. It marks functions and methods that always throw an exception, or always exit, or enter an infinite loop. Calling such function or method guarantees that nothing will be executed after it. This is useful for static analysis, because we can use it for type inference. I have an example, when you're accepting nullable object as a function parameter, you probably want to eliminate the null value before you can safely call a method on it. So, you will write if $object, three equal signs null, somehow handle this situation, and at the end of the if statement, you will return, or throw an exception. But instead of return, or throw, you might choose to call framework specific or a library specific function, that also always throws or exits the process. This will also tell the user, the IDE, and the static analyser, that below the if statement, the variable can no longer be null. For example, if you ever called mark test skipped in PHP unit, or if you call the abort function in Laravel, you've already used the function that would benefit from being marked with noreturn keyword.

Derick Rethans 4:24

You mentioned that currently people use the docblock no it @noreturn for that. Why would it be better to have it in the language?

Matthew Brown 4:31

Jumping off, Ondřej's point. PHP has this has this thing, right, you know things where the doc block, but PHP also, it's a language where developers are used to the language telling them if they did something wrong. So whereas other languages, you might need, like for example, JavaScript, they can be a bit more permissive. Developers when they write PHP code, they're used to getting errors instantly. They call a function with an object instead of a string, and expects a string, and it's marked in the signature as expecting a string, when they run that they get an error. And so that's just a kind of way that most PHP developers write cod. With a noreturn type, we sort of thought that there's a benefit to developer, having written a noreturn type, instantly getting an error if they actually do something that returns. So it follows that pattern that PHP has adopted, of, if I do something that violates a type that I've annotate, that I've explicitly added to the function, PHP should error. There's also a useful sort of side effect here, which is that when you add noreturn to a function, it's guaranteed that it will never return the context. If you call it, it will never not return because it will either whenever not throw an exception or exit, because if the noreturn is invalid, if it does actually do something where it's returning somehow, PHP will then throw a Type error. Cause it's supported by the language. If it wasn't supported by the language, you'd be able to use a function that called noreturn, and it wouldn't actually return. I mean obviously Ondřej and I are big fans of static analysis. The language itself isn't just to pat ourselves on the back and think, you know, we had the right idea when we were doing static analysis, it's because it can help PHP developers write code.

Derick Rethans 6:16

The void return type can only being used in specific locations, I mean you can't type a property for examples void. Are the similar restrictions on where noreturn can be used?

Ondřej Mirtes 6:27

Yeah, right now it can be used just as a return type. There might be some other possible usages, but they are not part of this RFC. For example the noreturn bottom type could be used as a parameter type to denote a function that shouldn't be called. So, this might be some relevant use case, but I've already had a feature request for PHP Stan, to actually support this type as a parameter type for callbacks that should never be called, but I don't remember why that person wanted this. Once we have generics, or at least the possibility to type what's in an array, we could also use the no return type for that. For example, array that contains noreturn, or never, would mean that the array is empty. And also during static analysis, the type inference engine also produces this type internally, basically to mark dead code. So for example if you ask better variable that can only ever contain an integer, if that variable can be a string, you're creating a condition that cannot be executed, that will be always false, and the resulting type of the variable inside that condition is the same type as noreturn or never.

Derick Rethans 7:41

You mentioned never there we haven't spoken about yet, but we'll get back to that in a moment I'm sure. Is there any prior art for this?

Matthew Brown 7:47

Yes, a number of languages have a noreturn type. Hack has specifically a noreturn type, Hack, if anyone listening doesn't know, hack is a language created as a sort of offshoot of PHP. Engineers at Facebook, when they were running into issues with PHP from about the moment they started using it in 2007/2008 as the site started growing, and performance really became an issue. And so eventually they created their own version, basically. And one of the benefits of working at Facebook is, you have lots and lots of smart engineers, and they added a lot of different typing functionality to this new language. And so one of the things I added was a noreturn type, as well as adding generics and many, many other things. Another language with prior art is type script. TypeScript has a never type, which is essentially the same. It's a bottom type as Ondřej was talking about. And a bottom type is the subtype of all subtypes. You have a class structure, you have exception, and then you have a child class of logic exception, and noreturn, is the subclass of subclasses of the child class, the thing right at the bottom of the type hierarchy, and so it can always be returned when you would expect some other thing. But basically, this is the understanding of what a bottom type is. I talked about interpreted languages to interpreted languages, but also many compiled languages, most recently Rust, that have the notion of a bottom type. It's a type, where you're guaranteed that program execution ends, in some way shape or form.

Derick Rethans 9:23

You mentioned that noreturn is the bottom type, how does that play with variance that PHP implements?

Matthew Brown 9:32

The concept of variance for return types is essentially, if a parent method returns something like a, an exception class, the child classes can either return an exception class, or they can return children for that same method of the exception. So let's say I have a method getException, that is described as returning an exception, the child methods in our child class, so child::getException can either return an exception, or they can also say they return a child class, so they can say, I actually return a logical exception, and this is valid according to Liskov substitution principle, which is to say: you're allowed to return a child type of whatever the overridden method was. So where this comes into play with noreturn is, noreturn is defined as is the bottom type is at the very bottom of all those class structures, you can always return a bottom type, basically And this makes sense if we just think about it, you're not breaking a contract, if your function always returns or exits; the variance rule to kind of follow that.

Derick Rethans 10:43

How would that compare with void? Because void has some interesting variance rules as well right?

Ondřej Mirtes 10:49

Actually, no or little similarities between void, and noreturn. Because when you are calling a void function or method, you expect it to do something, and then actually continue in the execution flow. Not expect to read the return value, but with noreturn, you call a method, and you don't expect it, the execution flow to continue. These are completely different, and I actually don't know how people can mistake one for the other.

Derick Rethans 11:22

Yes, seems very, very different to me as well. The RFC talks about alternative ways of introducing noreturn. And one of the things that had mentioned, is using the attribute. Attributes, being introduced in PHP 8.0. Why did you decide not to implement it as an attribute or suggested as an attribute instead?

Matthew Brown 11:43

Attributes I think are really cool. I think attributes have a place in the language, obviously they have a place as the RFC described, in place a docblocks, they can be reflected very quickly at runtime. And I also I'm interested in ideas like a deprecated attributes. And also I've just been kind of toying around in my head, the idea of a pure attribute, which could guarantee at runtime that a function with that attribute, was pure. It would never, for example, use a property, or it would never use like a static variable. We could guarantee purity of functions which would interest the pure functional programming people

Derick Rethans 12:26

Could you explain what a pure function is?

Matthew Brown 12:28

A pure function is a function that doesn't use any other data but the data you provide it. If I have a multiply function that takes two parameters, x and y, and it returns the multiplication of those things, you would call that function pure. There are many ways the function can become impure. One of the ways is it can have output, you can have IO output for example so if the body of the function you then echo the value of x, before returning, that function becomes impure because it's changed the environment that it operated in slightly. Additionally, if you metal memorize the value of x. So let's say you have x and y as inputs, and then in the first line, you take the value of a property elsewhere, and you add x to that value, and then you multiply that result, then that function is also impure, you're using data from outside the function to return this value. So the idea of a pure function is one which essentially can be modelled mathematically, and that's why some kind of purists, like the this idea because it allows things to be modelled mathematically, but more importantly, then it allows those functions to be tested very effectively. Some implements purity, so that you can add a docblock annotation to function and it will tell you that, whether or not the function is pure. This has extra benefits when writing really complex code. So the most complex code that Psalm has, which performs some boring computation, I've added these pure annotations everywhere. And what it does, it forces me to write the code in a way that avoid side effects. The hope is from my end that writing this very complicated code in a pure fashion, makes it easier to debug at some later date.

Derick Rethans 14:20

Thanks for that.

Matthew Brown 14:21

I think attributes are great and have these uses. I don't believe that attributes are useful to encode types, because PHP has a place where you can already represent types, you know, we've introduced into the language itself, the notion of typing, you know obviously many years ago. I think there is a benefit to where possible, keeping the types as types. There was a suggestion that noreturn could be an attribute instead, because it in some way it's it's really about behaviour. But it's still a type, and in the wider programming community, there is prior art for it to be considered a type. So there's basically no benefit to my mind so making it an attribute. And as well, the implementation as a type is very small, you know, it's less than, well under 100 lines of actual written PHP to implement this feature because it uses the existing checks that we already use. We also use for other return types, and to make an attribute we kind of take it out and very much expand the implementation. There are two good reasons there to not want to use an attribute.

Ondřej Mirtes 15:31

There are not very useful combinations. If noreturn was an attribute, then what would you write as a return type. There are not many useful combinations of what it could be.

Derick Rethans 15:44

And it also can't be used with any kind of variance rules any more.

Matthew Brown 15:48

Or at least if it were to be used for variance rules then we would have to write that logic. You'd be like why are we writing this logic in this particular way, it wouldn't make sense.

Derick Rethans 15:57

Because noreturn is a type, and not a behavioural thing. Makes perfect sense.

Matthew Brown 16:03

But it's both a type and a behaviour. In the same way that when you actually say, this function returns a thing, PHP then does a behavioural or check to make sure that that function always returns. You could argue that every type is essentially a behaviour, because you're saying the behaviour of this function has to return a value.

Derick Rethans 16:21

Earlier, one of you mentioned instead of noreturn, the never keyword. Is that the only alternative that that was discussed are the further ones?

Ondřej Mirtes 16:32

Well there's noreturn and never and the RFC is now going through the voting process, so the secondary vote is about the name, and some languages also use nothing. It feels more natural to say that that function never returns, or using the noreturn keyword, then, saying that it returns nothing which blends closer to the void, void keyword

Derick Rethans 16:58

Earlier you were mentioning that for future scope you wanted to use this new keyword that you're suggested to introduce also in all the locations where perhaps noreturn does not make sense.

Ondřej Mirtes 17:08

Yes. Also. What no return has going for it is that it's unlikely to be used as a class name somewhere so making it, whereas if key word isn't an issue, but just as you said, it looks like a key word for a single purpose being written in a return type thing that it's quite obvious which one of us two like which keyword, because I like never more. And one reason is that it's a single word and it reads more naturally in the source code, and it's also looks more like a full fledged type and TypeScript, uses the same keyword.

Derick Rethans 17:42

Why did you put noreturn in the RFC?

Ondřej Mirtes 17:44

Because Matt likes it more.

Matthew Brown 17:47

Yeah, I wrote the first draft of the RFC, I got first dibs, but this is a big point of contention with Ondřej and I, and we're almost at the point of not speaking to each other, because I'm on one side and he's on the other. And it looks at the moment like never will succeed. I think the TypeScript thing is a good point. When I wrote the RFC originally, I wasn't thinking that so many PHP developers write TypeScript. I hadn't really factored into my head. And I think, given that it does make more sense that never is used.

Derick Rethans 18:21

Looking at how recurrent voting is going, never has 32 votes going for it, and no return has 14 votes going for it.

Ondřej Mirtes 18:29

Just kidding. I can't wait to have a beer with him again, once the world is, is fine again.

Derick Rethans 18:35

Me as well.

Matthew Brown 18:36

He can't start inventing new words; like yeah ironically naming is hard right.

Derick Rethans 18:41

Definitely the case. At the moment it's very clearly looks like that, the new keyword is going to be never, with 40 votes for introducing a keyword to begin with and 10 against, so that looks like a done deal. Would either of you have anything else to add?

Ondřej Mirtes 18:57

Yeah, Derick, last time I refresh the wiki, I noticed that you haven't voted yet so what is going to be your vote?

Derick Rethans 19:04

I intend not to vote until I've spoken to the people on the podcast.

Matthew Brown 19:09

Great, great.

Derick Rethans 19:10

I will make sure to vote. Having said that, thank you very much for taking the time today to talk to me about this RFC.

Matthew Brown 19:17

Thank you. It was a pleasure.

Ondřej Mirtes 19:19

Yeah, I've been following this podcast closer since the beginning, so I'm happy I was able to finally join, and have something to talk about here. Thank you.

Derick Rethans 19:26

Thank you for listening to this instalment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time.


PHP Internals News: Episode 80: Static Variables in Inherited Methods

PHP Internals News: Episode 80: Static Variables in Inherited Methods

In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the "Static Variables in Inherited Methods" RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:14

Hi I'm Derick, welcome to PHP internals news, the podcast dedicated to explain the latest developments in the PHP language. This is episode 80. In this episode I speak with Nikita Popov again about another RFC that he's proposing. Nikita, how are you doing today?

Nikita Popov 0:30

I'm still doing fine.

Derick Rethans 0:33

Well, that is glad to hear. So the reason why you saying, I'm still doing fine, is of course because we basically recording two podcast episodes just behind each other.

Nikita Popov 0:41

That's true.

Derick Rethans 0:42

If you'd be doing fine 30 minutes ago and bad now, something bad must have happened and that is of course no fun. In any case, shall we take the second RFC then, which is titled static variables in inherited methods. Can you explain what is RFC is meant to improve?

Nikita Popov 1:00

I'm not sure what this meant to improve, it's more like trying to fix a bug, I will say. This is a really, like, technical RFC for an edge case of an edge case, so I should say first, when I'm saying static variables, I'm not talking about static properties, which is what most people use, but static variables inside functions. What static variables do unlike normal variables, is that they persist across function calls. For example, you can have a counter static $i equals zero, and then increment it, and then each time we call the function, it gets incremented each time and the value from the previous call is retained. So that's just the context of what we're talking about.

Derick Rethans 1:43

Why would people make use of static variables?

Nikita Popov 1:46

I think one of the most common use cases is memoization.

Derick Rethans 1:50

Can you explain what that is?

Nikita Popov 1:51

If you have a function that that computes some kind of expensive result, but which is always the same, then you can compute it only once and store it inside the static variable, and then return it. Maybe possibly keyed by by the function arguments, but that's the general idea. And this also works if it's a free standing function. So if it's not in the method where you could store state inside the static property or similar, but also works inside a non method function.

Derick Rethans 2:22

The keyword here in his RFC's title is inherited methods, I suppose. What happens currently there?

Nikita Popov 2:29

There are a couple of issues in that area. The key part is first: How do static variables interact with methods at all? And the second part is how it interacts with inheritance. So first if you have an instance method, with a static variable, then some people expect that actually each object instance gets a separate static variable. This is not the case. The static variables are really bound to functions or methods, they do not depend on the object instance. Second problem is: What happens with inheritance? So you have a parent class with a method using static variables and then you have a child class that inherits this method. There are two ways you can interpret this, either this is still the same method, so it should use the same variables, or you could say okay the inherited method is actually a distinct method and should use separate variables. PHP currently follows the second interpretation.

Derick Rethans 3:24

And is this even the case, if it's overridden, or just when it's inherited? Because there's a difference there supposed as well.

Nikita Popov 3:30

Yeah, this is the basic model that PHP tries to follow but there are quite a few edge cases. The one that's what you mentioned if you override the method, then of course, you're calling the overridden method so the static variables don't even come in. But if you then call the parent method, then usually you would expect, if you do an override and just call the parent that the behaviour is exactly the same as if you didn't override it at all. That's not the case here, because now you're calling the parent method. So the problem you have here is that if you didn't override the method, then the child method and the parent method would have different static variables. Now if you call parent, then you're just calling the parent method, so you get back to one set of static variables, which are the same for both methods. You can see they're the same for both methods, but rather because you're calling the parent method, there is only one method involved, only one set of static variables involved. You can't really just like seamlessly extend a method that uses static variables, without changing the behaviour by accident.

Derick Rethans 4:37

This sounds all very complicated.

Nikita Popov 4:39

Yes I said, I did warn you that this is an edge case of an edge case.

Derick Rethans 4:44

But I think the whole idea behind the RFC is to make it less complicated.

Nikita Popov 4:48

Yes, this is like not the, the only issue you can run into. There is another one that we have actually addressed separately, but which still exists on earlier PHP versions, which is that the value of the static variables depend on the time of inheritance. Let me be a bit more explicit there. What we had in previous versions is that half your parent method was a static variables, then you call that parent method, static variables change, then you inherit it. And again, call the inherited method. In that order: first declare the parents, call it, declare the child, call it. In that case we will actually take the static variables at the time, where the inheritance actually happens. The first call onto the parent method modify the static variables, then we will use some modified variables. From that point on, it will have a separate copy, the child method, but it will like pick up these original modifications before inheritance happens. Now in PHP 8.1, we actually already fixed that so that we always use the original values, but this is like just one more thing to the list of weird things that happen, if you use static variables inside methods and you inherit them.

Derick Rethans 6:06

I think I understand more of it now.

Nikita Popov 6:08

I think you understand more than you ever wanted to know.

Derick Rethans 6:12

You've mentioned the edge cases. What is the result, going to be once this RFC passes, which I'm going to think is quite likely

Nikita Popov 6:21

The result is, hopefully, simpler than what we have, namely that static variables are really bound to a specific function or method declaration. If you have one method using static variables, then you have only one set of static variables ever. If it's inherited, you still reuse the same static variables because there is no separate inherited method. It's just the same method in the child class. That's the concept.

Derick Rethans 6:52

And if you override it in an inherited class?

Nikita Popov 6:55

If you override it, and you call the parent method, then the behaviour is unchanged because you still have just a single set of static variables, so there is no edge case here any more because the child, the child method never had a separate set.

Derick Rethans 7:10

But if the overridden method also defines its own static variable with the same name?

Nikita Popov 7:17

That's possible. In that case, once again this rule is that each method has its own static variables and methods can have static variables and the child method can have them as well, if they are overridden and there are no name clashes between them.

Derick Rethans 7:32

Because they are going to be totally separated, meaning that any code you run in the inherited methods will only affect its static variables, and any code that runs in the original methods only affects the static variables that are bound to that specific method.

Nikita Popov 7:50

Exactly. I mean, in the end, static variables are really the type of global state, just a type that is kind of isolated to a specific namespace and doesn't cause clashes, so in that sense, it's important that these things are isolated.

Derick Rethans 8:06

And that would also make the behaviour, a lot more easier to explain than it currently is. Because every methods, has its own set of static variables.

Nikita Popov 8:15

Yes.

Derick Rethans 8:15

Or I should say, every declared methods, has its own set of static variables.

Nikita Popov 8:20

I guess that is an important distinction. If you can see the methods inside your code and see the static variable inside it, then that is a distinct one. If we ignore the exception of traits.

Derick Rethans 8:34

You're going to have to explain that as well.

Nikita Popov 8:37

Well traits are always a special snowflake. Our general model for traits is that they are compiler assisted copy and paste. So a trait should roughly behave as if you just copied all the methods into the class that's using the trait. And in that sense, if you are actually copying the code of your method with a static variables, then it should also use a distinct set of static variables for each use. And that is also how it is proposed to behave. So that is like the one exception where you have a single method declaration in your code, but each using class will get a separate set of static variables.

Derick Rethans 9:15

Because the code is copied in place, instead of linked, or used in place. It's also the case for all the methods declared in traits, they're also copied into the same symbol table as the methods belong to a class.

Nikita Popov 9:32

Yeah, that's right.

Derick Rethans 9:33

Should be reference counted in some way because you probably won't duplicate the exact data.

Nikita Popov 9:38

We of course don't actually copy the methods, or at least most of the methods, but from the programmer perspective that's how it works

Derick Rethans 9:46

Why do you say most of the methods, and not all the methods?

Nikita Popov 9:50

We separate two things there. There is the method itself. So the op-array, and then there is all the stuff it uses like the opcodes the like arguments information and so on. What they do for traits, is we share all the data, and only create a separate op-array. Reason is that there are some differences. For example, we have to adjust the scope, we have to adjust possibly the function name if aliases are involved, and we have to adjust the static variables. So it's like kind of a partial copy we do.

Derick Rethans 10:22

Which is probably the most efficient way of doing it?

Nikita Popov 10:24

Yes.

Derick Rethans 10:25

Because this RFC is changing behaviour due to bug fixes, I would probably argue, what kind of backwards compatibility issues are there? And have you looked at how much code that actually impacts?

Nikita Popov 10:39

I haven't looked how much code it impacts because this seems like pretty hard to really analyse. I mean I guess something we could easily check is how much static variables are used in methods at all. But it would be hard to distinguish whether this change. I mean how to distinguish in a completely automated way. Whether this change makes behavioural difference for a particular use case or not. So I can't really give information on that, though I would expect that impact is relatively low because the common use cases, things like memoization, they aren't affected by it, or they are only affected by it in the sense that: Then you will memoize a value only once for the whole class hierarchy instead of memoizing it once for each like inherited class.

Derick Rethans 11:32

So, it's going to improve the situation there as well, is pretty much what you're saying?

Nikita Popov 11:36

Yeah, but I'm sure there are also cases where the previous behaviour was like intentionally used. I mean it was never documented, but you know if some behaviour exists, people will always make use of it in the end, but I can't really say exactly how much impact this would have.

Derick Rethans 11:55

Do you have anything else to add, discussing this RFC?

Nikita Popov 11:59

No, I think that's it.

Derick Rethans 12:00

Then I would like to thank you for taking the time today, again, to talk to me about static variables in inherited methods.

Nikita Popov 12:08

Thanks for having me once again.

Derick Rethans 12:16

Thank you for listening to this instalment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time.