PHP Internals News: Episode 93: Never For Parameter Types

PHP Internals News: Episode 93: Never For Parameter Types

In this episode of "PHP Internals News" I chat with Jordan LeDoux (GitHub) about the "Never For Parameter Types" RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:14

Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is Episode 93. It's been quiet over the last month, so it didn't really have a chance to talk about upcoming RFCs mostly because there were none. However, PHP eight one's feature freeze has happened now, a new RFCs are being targeted for the next version of PHP eight two. Today I'm talking with Jordan LeDoux, about the Never For Parameter Types RFC, the first one targeting this upcoming PHP version. Jordan, would you please introduce yourself?

Jordan LeDoux 0:50

Certainly. And thanks for having me. My name is Jordan. I've worked as a developer for about 15 years now. Most of my career has been spent working in PHP. Although professionally, I've had experience working in C#, Python, TypeScript, mostly in the form of JavaScript, but a little bit of Node and, you know, a variety of other languages that I haven't spent enough time in to really be proficient in any real way. But recently, I decided to do something that I have thought about doing for many years, but never actually jumped into which is exploring the PHP engine itself and how I could possibly contribute to it.

Derick Rethans 1:32

And here we are, but your first our thing.

Jordan LeDoux 1:35

Yeah, it's exciting.

Derick Rethans 1:36

What is this RFC about, what does it propose?

Jordan LeDoux 1:39

Well, this RFC proposes allowing the never type, which was added in 8.1 as a return value, to parameters for functions and methods on objects. The main idea behind that is that when never was proposed as a return type, it was meant to signal that the function would never return. Not that it returns void, which of course, void signifies which is returning no value or returning, returning without any specified information. And never return signifies that the function will never return, which is a concept that exists in many other languages. And for that purpose in other languages, what's usually used is something called a bottom type. And that's what never ended up being. And I'm proposing that we extend the use of that bottom type to other areas where the type may be helpful.

Derick Rethans 2:38

So a bottom type, that might be a new term for many people, it will certainly for me when I looked at the never RFC for as return types. Can you sort of explain what a bottom type is, especially thinking about object oriented theory with something that we'd like to call the Liskov Substitution Principle? And also, how does it apply to argument types?

Jordan LeDoux 2:59

Let's start with the Liskov Substitution. The general idea behind Liskov Substitution is that if A is a subtype of B, then anywhere that A exists, you should be able to substitute B. It has to do with when you have a class hierarchy in in an object oriented language, that that class hierarchy guarantees certain things about substitutionality, like whether or not something can be substituted for something else. That affects language design in ways that a lot of programmers are kind of intuitively familiar with, but maybe not familiar with the theory and the ideas behind it more concretely. But LSP is the principle in SOLID, that's the L and SOLID. And it represents a portion of the whole idea of object oriented programming in PHP. Part of being able to substitute one object for another, based on their class hierarchy, and what they implement, and what they provide is there part of their ability to be substituted is whether or not they can fulfil the same kind of contractual requirements of typing. And with Liskov, that means that preconditions can never be strengthened, and post conditions can never be weakened. So a precondition would be a parameter type requirement. If you require that a parameters accepts an object, for instance, in PHP, you can't strengthen that requirement beyond just any object to a particular object. But you can weaken it from an object to an object or an integer with with unions. That's an example of the precondition side of it. The post condition side of it is that you can't, you can't weaken it. So if you have have, you know, if you have a return type of int, you can't have an inherited implementation return int or float, because that broadens the possible return types. They go in opposite directions. And one of them is covariance and one of them is contravariance.

Derick Rethans 5:18

I can never remember which one is which.

Jordan LeDoux 5:21

Yeah, basically contravariance go up the tree and covariance go down the tree. If you're thinking about widening, or sorry, narrowing. If you're thinking about narrowing, then covariance go down the implementation tree and contravariance go up the implementation tree.

Derick Rethans 5:40

Okay, so how does the bottom type fit in here?

Jordan LeDoux 5:43

The bottom type in any type system represents like the base type that all types originate from. And the best way in my mind to think about it is kind of just integer math. It's a thing that every programmer is going to be familiar with. And it fits, it fits all right. So you can think of the bottom type as zero integers, a lot of people would think of null as zero if they're thinking about a type system, but null is more like negative one. It's like the entire negative side of the integer system. We could say that the string type is one, and the integer type is two. And the float type is three, and maybe int or float, the union of them is five, which would be the two numbers added together. And if you describe type systems this way, then you can say, hey, if I take any type and add the value that represents the other type, then I get my result type. So if I take zero, the bottom type, and I add one, the string type, what I end up with is one, still the string type. So the bottom type is whatever type system or whatever, whatever type, when you add any other type to it, you get the type you added to it and nothing else, you just get your original thing. That's why it's called the union identity for the type system. The top type in PHP is mixed. And that's the opposite side of it. It's just like zero is the additive identity. One is the multiplicative identity. If you multiply anything by one, you're going to get what you originally had. And if you add anything to zero, you'll get what you originally had. So mixed ends up being, or the top type in general, ends up being the intersection identity, and the bottom type, or never, in PHP's case, ends up being the union identity. And all this is like deep type theory, but most programmers don't have to interact with it. It's more something that affects language design usually.

Derick Rethans 7:48

Could you think of mixed as being infinity?

Jordan LeDoux 7:50

That's actually with my with my crude integer analogy, yeah, it would be like all types, all possible types are added together.

Derick Rethans 7:59

That makes sense then. Okay, so we have explained what the bottom type is, but, and never being the bottom type. So why is it useful to use the bottom type, or never, as this RFC proposes, as a method argument type for parameters?

Jordan LeDoux 8:14

The largest benefit has to do with what we were talking about when it comes to covariance versus contravariance. You know, can you strengthen the requirements? Or can you weaken the requirements? When you inherit a method in a system that preserves Liskov Substitution, the parameters can be widened, they can accept more things. If I had an interface that said, it has one parameter, and that parameter is typed as int, then in any implementation, I could say, okay, but actually, the parameter type is int or float, I'm going to accept both. And I could do that in the implementation. But I would have to accept int, because that's part of the contract. That's part of the interface. So I have to accept whatever type is in the interface. I can just additionally add things on top of that. If I make my original definition, my root definition, the bottom type, then I can add any type to it. And I will just get that type. From never, if I had an interface with, you know, a method foo, and it has one argument, and that argument is typed never, I can re-type that argument as int, or I could re-type that argument as string, and all of them would be valid inheritances.

Derick Rethans 9:38

So it's a way of getting around having at least one concrete type like int in an interface.

Jordan LeDoux 9:44

Right. And in fact, never is a concrete type. It's a concrete type, that means this code can never be called. For return type, it means this type will never return. But if you're, if what you're trying to do is call it then you can never call it.

Derick Rethans 10:01

Which is then why never actually makes sense as a type name, because one of my further questions is going to be how does never as a name make sense? But you've now explained it in such a way that it actually does make sense. So there we go.

Jordan LeDoux 10:15

One of the questions that did come up in the internals discussion on this so far, has been a round that choosing never, and because a lot of the examples for use cases are around inheritance, it makes some kind of intuitive sense, if you're describing the inheritance behaviour, to name it something like any or anything, or you know, something about its how permissive it is in its inheritance definition. But the type should really describe the code that it's actually written in. You know, getting outside the idea of an interface or an abstract or something like that. If you have a function, and that function types it as never, or whatever you name, the bottom type. There's no data that can satisfy that. Because any data will have some type other than never. It'll have string, or int, or something. Even null has the null type. You can't provide any data to a function that requires never that will satisfy its type requirement. So that code can never be called. It's really when you start considering how does this affect inheritance that you start getting into this concept of Oh, maybe a different word makes sense. But then that doesn't really reflect the opposite side on the return side, when you're talking about covariance instead of contravariance. And that's why the bottom type for most languages is something like never, or nothing, or nil, something along those lines.

Derick Rethans 11:48

So if you type an argument to a method as never, the engine will, of course, enforce that you can't call it with any data because it wouldn't satisfy. But would it also automatically make a method abstract so that inheritance inheriting classes have to widen it or not?

Jordan LeDoux 12:04

So that was part of my original idea behind the RFC, was forcing something that implements it or something that inherits it, to widen it. However, there were a lot of good arguments about why that may not be a good idea. One is that in PHP, an empty type isn't an empty type, it's mixed, not widening, would actually just be saying the type is mixed, which is valid, you can go from the bottom type to the top type in a contravariant way, that's a totally valid way to do it. The problem is that PHP has a weakly enforced typing system. Like that's only a problem in this context, it's actually a very powerful feature of the language in a lot of other contexts. So we don't necessarily want to get rid of that. In addition to that, the actual mixed type as a literal that can be used in the language was only added in 8.0. It would kind of represent a much larger backwards compatibility break to require it to be explicitly widened. That was part of my original concept for a lot of the reasons that you were just talking about. But for PHP, specifically, it would probably present more problems than it would provide kind of solutions and utility. I've kind of been convinced off that point a little bit by the arguments of others.

Derick Rethans 13:22

Would it be possible to instantiate a class that has a method with its argument typed as never?

Jordan LeDoux 13:28

Yes, as long as he never called that argument directly.Using a never type in a constructor would definitely be a definitely be a No, no, that would result in a type error. As soon as you try to instantiate the class.

Derick Rethans 13:41

It would basically make the constructor private.

Jordan LeDoux 13:43

Yeah, you would get a slightly less useful error.

Derick Rethans 13:47

Yes.

Jordan LeDoux 13:48

Then you do if you make the constructor private, and then try and call it.

Derick Rethans 13:53

It makes no sense to do it. The RFC slightly touches on generics. And it goes in a way talking about why this is sort of slightly like generics. Could you explain the interaction between these two concepts?

Jordan LeDoux 14:07

Generics is a feature obviously, that a lot of PHP developers want. And it's also a very complicated feature to do. Way outside of what I was willing to consider for, for my first, my first attempt at something useful. One of the most common ways that generics are used, is within an inheritance structure to allow something similar to type widening, particularly for parameters. Being able to say, I want the type to be able to be widened, but I don't know exactly how it will be widened. That's something that generics offer. Generics offer many other features as well and many other capabilities. But that particular one is something that this can do. It comes with a cost though, because this isn't generics. It is not really the right way to do that. It can be done without generating compile errors now, if you use never, that's the main difference. You still would encounter errors in static analysis and IDE hints, for instance. The IDE, he wouldn't be able to tell if you typed against a interface that had never as a parameter type, it wouldn't be able to tell what sorts of types the implementers have. Because that's not really the point of accepting an interface as a type for a parameter, or for a function call, or something like that. The point is that any implementer of this will be an acceptable type. But that means that from a static analysis perspective, it won't know what the type requirements are, because it won't know what concrete implementations are being provided in the code. So obviously, this is a limitation. And this limitation does not exist in a in an actual generics implementation of some kind. I do view it as an improvement personally. And the main reason is that before, if you tried to do something like this, then the errors that you generated and the problems that you caused, were in your code, in its actual execution. Now, the errors and the problems that you need to solve are going to be in your static analysis or in your IDE. And that's, that's something that's a lot safer for the code in general. It can present some maintainability challenges, it can be annoying for developers to deal with, all of that's absolutely true. And it doesn't provide all the things that you would want from being able to do that. It moves the where the safety problems are from being in code execution, where it can cause real problems into code writing, which is where you have the opportunity to kind of think about it, reason about it and catch it.

Derick Rethans 16:54

From what I understand is that if you have a class doing implementing some kind of generics, then you'd often expect that where this generic type is used in either argument parameter, or return value, that'd be the same for all the methods, whereas with never, you can of course not enforce it, that it isn't being the same type being used, in all the other places where you would otherwise expect or enforce with generics to be the same type. So I reckon that's one of the differences. I think that was a better explanation that what I read from the RFC.

Jordan LeDoux 17:25

That was a explanation and in argument that I wasn't forced to actually articulate until I presented it to other people, because I went through several days of research before even writing the RFC. And sometimes when you do that, particularly for you know, for programming related things, you just absorb the information and you kind of forget about what did I, which things were new, like which things that I just learned, and which things that I already know. You just integrate it all into your new programming knowledge. And so that was definitely something that when I wrote the first draft of the RFC, I didn't, didn't articulate particularly well, because I it just made sense in my head. And I had forgotten: Hey, this is something I didn't know a week ago, I should probably explain it to others.

Derick Rethans 18:10

That's why we have the RFC process, right? So that other people also voice the opinion, and perhaps all the slightly confused language that make perfect sense to the author, but not necessarily to other people that read it, right?

Jordan LeDoux 18:23

Yes.

Derick Rethans 18:24

The RFC kindly mentioned that there's no backwards incompatible changes. So that's always good news, that makes it a little bit easy to accept. What was sort of the biggest pushback against the RFC?

Jordan LeDoux 18:36

Initially, actually, the most pushback that I got, when I presented it, was a round choosing never as the name, which I thought would, in my mind, I thought would be something that was barely discussed, actually. But I mean, that again, just like you said, that's why this process exists. So that everybody can actually understand, you know, the things that go into it. I did do the research into that prior, but I couldn't find a single language that had more than one bottom type. The concept of more than one bottom type itself, if you if you go back to the integers concept, that'd be like having what positive zero and negative zero or something like that. It's a concept that just intuitively when you, when you understand how the type system itself works, you feel like okay, so there's probably something wrong from a design perspective, if you have more than one bottom type. It's very easy to not be able to see that intuitively. If you don't go in and take a really deep look at how the type system works or, or how types in general work and what they mean and stuff like that. That was the biggest pushback that I got initially. That mostly just involved explaining the things that I just said about like, why is that the case? Why does that make sense? What are some examples of other languages? Just going into that kind of information. The biggest blocker at the moment, really is about, does it make sense to provide never as a parameter type? If you can't use that statically? If you can't use it in a static analysis situation, then does it make sense to ever use it? And if it doesn't make sense to ever use it, then even if it provides contravariance, is it worth adding? And that's the main discussion that's being had right now, that's not entirely resolved. I think that there is still value in doing that. I think that that argument would carry a lot more weight with me personally, if PHP had a better way of handling the situations where that might happen. That's a much larger undertaking, which I would be interested in, but is also not really not really something that would be as kind from a backward compatibility standpoint.

Derick Rethans 20:54

Which we are quite keen on.

Jordan LeDoux 20:57

Right, exactly. I don't personally see a way that this type of functionality could be provided, that could satisfy that concern, and not also invalidate enormous amounts of code that currently exist. You kind of have to choose one or the other from my understanding. I will be very pleased if somebody is able to provide me a way to, even if it's a lot of work, provide me a way that that can be accomplished without breaking a lot of code. That was one of my goals is I don't want this, I want this to be an addition to what currently exists, not not something that breaks the current typing that we have, or the way that PHP developers currently interact with the type system or anything like that. That's the current discussion. And I think the thing that is most unresolved.

Derick Rethans 21:46

I think you'll have a hard time trying to introduce breaking changes into the language. And I also think rightfully so.

Jordan LeDoux 21:54

Yeah, it's it's a good safeguard. And it's a good principle to have in general, I think. I think the way that I described it is that this type of breaking change, the type that would be necessary, wouldn't really be like the type of breaking change you expect in a major version, it would more be like a parallel language syntax. It'd be more like rewriting PHP into a different language with very similar semantics, but very different idea of what the language means underneath. Because fundamentally, in order to do that, typing could no longer be optional anywhere. It would have to be every variable, every piece of data, every function, would have to have an explicit type that the developer is able to control and modify and mutate as they want. I don't even know if type juggling would be possible with the kind of change that would be necessary. And that's, that's half of what PHP consider PHP, so.

Derick Rethans 22:55

Definitely not requiring types and places. Because I think, as I said, I think you'll have a hard time convincing people to go that way.

Jordan LeDoux 23:02

Which is the reason I didn't consider going that way, really, so.

Derick Rethans 23:06

Would you have anything else to add that I'm, that we missed discussing this RFC?

Jordan LeDoux 23:10

This RFC is really interesting to me personally, just from an intellectual perspective. I think that for a lot of users, there's not a lot of use cases where you would use it in your own programs. If you did, it would end up being in like the very core base systems, and only in a few places. In those places, it might really shine, it might be something that's absolutely incredible. Most places in most programs you would never be using, well, you would never be using this type. It's much more critical for some of the internal features things like array access, that interface, and the typing that it requires for parameters. The typing for that's pretty broken. This could be a way that we could fix that possibly, and a couple of the other internal engine features that are implemented through interfaces and things like that would also probably be helped quite a bit by this.

Derick Rethans 24:10

Thank you very much for taking the time today to talk about the Never For Parameter Types RFC.

Jordan LeDoux 24:16

Thank you for having me.

Derick Rethans 24:20

Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of a PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool, you can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time.


PHP Internals News: Episode 92: First Class Callable Syntax

PHP Internals News: Episode 92: First Class Callable Syntax

In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the "First Class Callable Syntax" RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:14

Hi, I'm Derick. Welcome to PHP internals news, the podcast dedicated to explaining the latest developments in the PHP language. This is Episode 92. Today I'm talking with Nikita Popov about a first class callable syntax RFC that he's proposing together with Joe Watkins. Nikita, would you please introduce yourself?

Nikita Popov 0:36

Hi, Derick. I'm Nikita and I am still working at JetBrains. And still working on PHP core development.

Derick Rethans 0:43

Just like about half an hour ago when we recorded an earlier episode.

Nikita Popov 0:47

Exactly.

Derick Rethans 0:48

This RFC has no relation to read only properties. What is the first class callable syntax RFC about?

Nikita Popov 0:55

The context here is that PHP has the callable syntax based on literals, which is that if you just use a plain string, it's interpreted as a function name, and an array where the first element is an object, and the second one is a method name, that's methods. Or the first element is the class name, and the second one is method name, that's a static method.

Derick Rethans 1:17

I would consider this concept a bit of a hack, especially the the one with the arrays, and I reckon you feel similar and hence this RFC?

Nikita Popov 1:27

Yes, I do. So the current callable syntax has a couple of issues. I think the core issue is that it's not really analysable. So if you see this kind of like array with two strings inside it, it could just be an array with two strings, you don't know if that's supposed to actually be a static method reference. If you look at the context of where it is used, you might be able to figure out that actually, this is a callable. And like in your IDE, if you rename this method, then this array should also be this array element will also be renamed. But there's like a lot of complex reasoning that the static analyser has to perform. That's one side of the issue. The second one is that callables are not scope independent. For example, if you have a private method, then like at the point where you create your callable, like as an array, it might be callable there, but then you pass it to some other function. And that's in a different scope. And suddenly that method is not callable there. So this is a general issue with both like this callable syntax based on arrays, and also the callable type. It's a callable at exactly this point, not callable at a later point. This is what the new syntax essentially addresses. So it provides a syntax that like clearly indicates that yes, this really is a callable, and it performs the callable callability check at the point where it's created, and also binds the scope at that time. So if you pass it to a different function in a different scope, it still remains callable.

Derick Rethans 3:01

And it's guaranteed to always be callable.

Nikita Popov 3:03

Yeah, exactly.

Derick Rethans 3:04

What does the syntax like?

Nikita Popov 3:06

The syntax is the funny bit. As a bit of context. This proposal was created as an alternative or as a subset of the partial function application RFC.

Derick Rethans 3:17

That is just as hard to pronounce as first class callable syntax RFC.

Nikita Popov 3:21

Yes, that's why we say PFA. The PFA RFC has a more general feature. It also allows you to create a reference to a callable as a side effect. But more generally, it allows you to also bind some of the arguments to a fixed value. And has like finer control over for example, you can create a callable that has three required parameters, by passing three question mark arguments. While the new syntax only allows you to use the signature of the original function. But the syntax between both of those is compatible. So the new RFC is a subset of PFA. And that's why it uses the syntax where you do a normal function call, but then pass three dots or an ellipsis as arguments.

Derick Rethans 4:08

Instead of passing the function's or method's normal arguments, you use the three dots.

Nikita Popov 4:14

I think like the way to think about the syntax is that this is similar to like a variadic argument, or to the argument unpacking syntax, just that the arguments haven't yet been provided, they will be provided during the actual call. But I think the syntax was definitely the most contentious bit in the discussion of the RFC. I think this is mainly related to the fact that if you the see this code snippet, it looks a bit like, like the example code where the arguments haven't been filled in. While now this is like actual syntax.

Derick Rethans 4:44

I'm sure there's quite a few tutorials out there explaining how PHP works by using dot dot dot. That is not something you can avoid.

Nikita Popov 4:54

Well, we can avoid it, but it's fairly tricky question. I mean, the reason for this dot dot dot syntax, on one hand, this the compatibility with partial functions. I mean, the PFA, RFC has recently been declined. But in the future, we could extend the current syntax to full partial functions. And we would not end up with two different ways. So that's one benefit of the syntax. But the other part is that PHP has different symbol tables for different kinds of symbols. People often ask, why can't you just write like strlen as a plain name, not inside a string, and have that be treated as a reference to this function? And the answer to that is that we can't do that because you can't have a constant that's called strlen. Normally, that would be reference to constant and the same actually applies to all other callable types as well. So if you have something like methods, like object or method name, that would right now be interpreted as a property access. And for static methods, it will be interpreted as a as a class constant access. So we have this ambiguity here. Even if we add an additional symbol to this, for example, like for classes, we have the syntax, class name, and then scope operator class, that gives you the class name. We could do something like strlen, scope operator function, or fn, or whatever, and have that return the callable. That would work, but it also has some ambiguities. For example, if you have something like object, arrow methods, and then scope operator fn, you have this ambiguity. Is this referencing the method of that name? Or is it referencing a callable stored inside the property of that name? This is like fundamentally ambiguous. The way we would resolve it is we will just say that this index is only usable with real simple, so it will always refer to a method, and you couldn't use the syntax to convert the callable stored in a property into a proper callable. I'm actually not sure how I should distinguish these two concepts, because we have the existing callable, strings and arrays, and the first class callables, which are really closure objects.

Derick Rethans 7:11

Which actually sort of brings me to the next question which just popped in my head, which is: Does this first class scalable syntax, what is returned as return a closure or an existing callable type as we have now, with a callable type being a single string, or this array syntax that we now use.

Nikita Popov 7:28

The syntax returns a closure. Actually, the syntax works essentially the same way as the closureFromCallable method. And we do need to return a closure otherwise, we don't get this behaviour where the scope is bound at the time where the callable is created, rather than called. I think maybe going forward, I would generally recommend that people use a closure type, instead of a callable type in type declarations. I mean, you already cannot use callable for property types. Exactly due to this problem that callability is context dependent. While we only forbid it in property types, the same general problem also exists for argument and return types. And especially with the new syntax being introduced here, I think it's best to use closure instead of callable in the future.

Derick Rethans 8:18

Does that sort of mean that first class scalable syntax is syntactic sugar? Or does it do more than the closureFromCallable method?

Nikita Popov 8:27

No, I think it's effectively just syntactic sugar for closureFromCallable.

Derick Rethans 8:34

I'm actually not sure whether Xdebug is be able to do anything with these closure from callable things to begin with. So that is something I'm going to have to investigate.

Nikita Popov 8:44

Be able as in like, display that it actually refers to a specific method rather than just some kind of closure?

Derick Rethans 8:51

Yeah, because at the moment, it shows you the file name and the line numbers, it doesn't have a name right if you create normal closures, but in this case, it's important to know that it actually refers to specific methods, which is the same thing as the closureFromCallable syntax would also do, but I've never done anything with that.

Nikita Popov 9:10

But I think there is a way to get like the underlying prototype for the closure, and you should be able to determine it from there.

Derick Rethans 9:18

The first class callable syntax, are there situations where you can't use it?

Nikita Popov 9:22

One place where you don't want to use the new syntax as if you don't want to actually create a closure object, and validate callability at the point of creation. For example, creating this first class callable also implies that you have to autoload the class for a static method. If you have some kind of like large definition of of handler, of static handler methods for routes or something like that, then using the first class callable syntax would imply that you have to immediately create closure objects for all of these and immediately load all those classes. That's a use case where you might want to stick with the old syntax.

Derick Rethans 10:01

But wouldn't opcache resolve that issue really?

Nikita Popov 10:04

No, opcache is really exactly the reason why you wouldn't want to do that. For example, for my fastroute library, I cache all the data as a static array. And that's something that OpCache can cache very efficiently because it's in shared memory and accessing it is essentially zero cost. If you include something like first class callables in it, then those have to always be created at runtime, because we don't have concept like, like a persistent object. That means that this can no longer, I mean, the whole script can be in shared memory, but it still has to be executed always at runtime to construct the whole data structure. And that's going to be less efficient. To give a more clear answer to your question is that the first class callable syntax has a cost when creating the callable, and if you are in a situation where avoiding that cost is really critical for performance, that's why you wouldn't want to use it.

Derick Rethans 11:00

And instead you'd have to use the old scalable syntax that we already have.

Nikita Popov 11:03

Exactly. So for that reason, I think that the old syntax is not going to be removed in the near future at least, though maybe we can deprecate certain aspects of it. For example, the syntax also allows you to do highly context dependent things like referencing self, which is even worse than the situation with a private method, because self could refer to something different every time you call it. Those are some things we might want to deprecate early, but the main syntax itself was probably going to stay for a while.

Derick Rethans 11:34

Because callability is checked when you create the closures does that mean it also checks for strictness then? If your PHP file has been declared with strict types?

Nikita Popov 11:45

Strictness is handled the same as with closureFromCallable.The strictness is still determined at the time where the call is made, not where the callable was created, which actually, I am not a fan of how PHP handles strict types together with dynamic calls. But that's like a pre existing problem. And this isn't touching on this.

Derick Rethans 12:06

The language has many issues that probably could have been done better if it was designed from scratch. But that ship has sailed 26 years ago.

Nikita Popov 12:15

The strict types are not quite that old.

Derick Rethans 12:17

No, that is true. The language itself is of course.

Derick Rethans 12:23

Okay, thank you very much then, for taking the time this morning to talk to me about first class scalable syntax.

Nikita Popov 12:29

Thanks for having me, Derick.

Derick Rethans 12:30

Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time.


PHP Internals News: Episode 91: is_literal

PHP Internals News: Episode 91: is_literal

In this episode of "PHP Internals News" I chat with Craig Francis (Twitter, GitHub, Website), and Joe Watkins (Twitter, GitHub, Website) about the "is_literal" RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:14

Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is Episode 91. Today I'm talking with Craig Francis and Joe Watkins, talking about the is_literal RFC that they have been proposing. Craig, would you please introduce yourself?

Craig Francis 0:34

Hi, I'm Craig Francis. I've been a PHP developer for about 20 years, doing code auditing, pentesting, training. And I'm also the co-lead for the Bristol chapter of OWASP, which is the open web application security project.

Derick Rethans 0:48

Very well. And Joe, will you introduce yourself as well, please?

Joe Watkins 0:51

Hi, everyone. I'm Joe, the same Joe from last time.

Derick Rethans 0:56

Well, it's good to have you back, Joe, and welcome to the podcast Craig. Let's dive straight in. What is the problem that this proposal's trying to resolve?

Craig Francis 1:05

So we try to address the problem where injection vulnerabilities are being introduced by developers. When they use libraries incorrectly, we will have people using the libraries, but they still introduce injection vulnerabilities because they use it incorrectly.

Derick Rethans 1:17

What is this RFC proposing?

Craig Francis 1:19

We're providing a function for libraries to easily check that certain strings have been written by the developer. It's an idea developed by Christoph Kern in 2016. There is a link in the video, and the Google using this to prevent injection vulnerabilities in their Java and Go libraries. It works because libraries know how to handle these data safely, typically using parameterised queries, or escaping where appropriate, but they still require certain values to be written by the developer. So for example, when using a query a database, the developer might need to write a complex WHERE clause or maybe they're using functions like datediff, round, if null, although obviously, this function could be used by developers themselves if they want to, but the primary purpose is for the library to check these values.

Derick Rethans 2:05

That is a method of doing it. What is this RFC adding to PHP itself?

Craig Francis 2:09

It just simply provides a function which just returns true or false if the variable is a literal, and that's basically a string that was written by the developer. It's a bit like if you did is_int or is_string, it's just a different way of just sort of saying, has this variable been written by the developer?

Derick Rethans 2:28

Is that basically it?

Craig Francis 2:30

That's it? Yeah.

Joe Watkins 2:32

It would also return true for variables that are the result of concatenation of other variables that would pass the is literal check. Now, this differs from Google, because they introduced that at the language level, but not only at the language level, at the idiom level. So that when you open a file that's got queries in PHP, commonly, if they're long, basic concatenation is used to build the query and format it in the file so that it's readable. So that it wouldn't really be very useful if those queries that you see everywhere in stuff like PHPMyAdmin, and WordPress, and Drupal and just normal code weren't considered literal, just because they're spread over several lines with the concatenation operator. It's strictly not just stuff that's written by the programmer, but also stuff that was written by the programmer or concatenated, with other stuff that was written by the programmer.

Derick Rethans 3:33

Now in the past, we have seen something about adding taint supports to PHP, right? How is this different, or perhaps similar, to taint checking?

Craig Francis 3:44

At the moment today, there is a taint extension, which is something you need to go out your way to install, and actually learn about and how to use. But the main difference is that taint checking goes on the basis of say, this variable is safe or unsafe. And the problem is that it considers anything that had been through an escaping function like html_entities as safe. But of course, the problem is that escaping is difficult. And it's very easy to make mistakes with that. A classic example is if you take a value from a user, an SSH SSH, their homepage URL, if you use HTML encoding, and then put it into the href attribute of a link, that can also result in HTML injection vulnerability, because the escaping is not aware of the context which is used. Because if the evil user put in a JavaScript URL, that is in inline JavaScript, that has created a problem because taint checking would assume that because you use HTML encoding it is safe, and all I'm saying is that is it creates a false sense of security. And by stripping out all that support for escaping, it means that you can focus on libraries doing that work because they know the context, they understand the domain, and we can just keep it a much simpler, and much safer approach.

Derick Rethans 5:02

Would you say that the is_literal feature is mostly aimed at library authors and not individual developers?

Craig Francis 5:09

Yeah, exactly. Because the library authors know what they're doing. They're using well tested code, many eyes over it. The problem libraries have at the moment is that they trust the developer to write things themselves. And unfortunately, developers introduce a lot of injection vulnerabilities with those strings before they even get into the library.

Derick Rethans 5:30

How would a library deal with with strings that aren't literal then?

Craig Francis 5:35

So it really depends on each individual example. And the RFC does include quite a lot of examples of how each one will be dealt with. The classic one is, let's say you're sorting by a column in a database, because if we're dealing with SQL, the field name might come from the user. But that is also quite a risky thing to do if you start including whatever field name the user wrote. So in the RFC, I've created a very simple example where the developer would create an array of fields that you can sort by, and then whatever the user provides, you search through that array, and you pull out the one that you that matches and is fine. And therefore you are pulling out a literal and including into the SQL. To be fair, these ones are quite unique. And each one needs to be dealt with in its own way. But I've yet to find an example where you can't do it with a literal. Having said that, I think Larry Garfield actually gave an example where a content management system changed its database structure. And the way that would work is the library would have to deal with it, they would receive the value for a field, and then that field would be escaped and treated as a field, it understands it as a field, and it will process it as such, then it can include into the SQL, knowing full well that everything else in that SQL is a literal, and then it can just build up SQL in its own way internally.

Derick Rethans 6:58

Okay, talking a little bit about the implementation here. Since PHP seven, we have this concept of interned strings, or maybe even before that actually, I don't quite remember. Which is pretty much a flag on each string and PHP that says, this's been created by the engine, or by coconut. Why would strings have to have an extra flag here to remember that it is created by the programmer?

Joe Watkins 7:21

Well, interned does not mean literal. It's an optimization in the engine, should we use strings. We're free to do whatever we want with that. At the moment, it by happenstance, most interned strings are those written by the programmer. If you think about the sort of strings that are written by the programmer, like a class name, when those things are declared internally, by an extension, or by core code, those things are interned as if they were written by the programmer. They don't mean literal, we're free to use interned strings for whatever we want. For example, a while ago, someone suggested that we should intern keys while JSON decoding or unserializing. It didn't happen, but it could happen. And then we'd have the problem of, well, how do we separate out all this other input. There is another optimization attached to interned strings, which is one character strings, where if you type only one character, or you call a Class A or B, or whatever, the permanent interned string will be used. That results in when the chr function is called, that results in the return of that function always being marked as interned. So it would show as literal, which is not a very nice side effect. And that's just a side effect that we can see today. We don't want to reuse the string really, it does need to be distinct. Also, if you're going to concatenate, whether you do it with the VM or a specific function, obviously, you need to be able to distinguish between an interned string and a literal string, which interned means it has a specific life cycle and specific value. And we can't break that.

Derick Rethans 9:00

So there are really two different concepts, is what you're saying, and hence, they need to have a special flag for that?

Joe Watkins 9:06

Yeah, they're very, two very separate concepts. And we don't we don't want to restrict the future of what interned strings may be used for. We don't want to muddy the concept of a literal.

Derick Rethans 9:16

Of course, any sort of mechanism that languages built into solve or prevent injections in any sort of form, there's always ways around it. Theoretically, how would you go around the is_literal checks to still get a user inputted value into something that passes the is_literal check?

Craig Francis 9:36

Generally speaking, you would never need it because the library should know how to deal with every scenario anyway. And it's not that difficult. We're only talking about things like in the database world, you'll be taking value from field names and therefore it should receive field names or table names. And, you know, we are providing a guardrail as a safety net. And what should happen is that the default way in which programmers work should guide them, to do it the right way. We're not saying that you can't do weird things to intentionally work around this. A really ugly version, which you should never do, but use eval and var_export together, it's horrible. But if you are so desperate, you need to get around this. That's what we're doing it. But in reality, we can't find any examples where you'd actually need to do this.

Joe Watkins 10:22

I would say that, hey, there's this idea that most people writing PHP are using libraries, and they're using frameworks. I don't actually find that to be true. I've been working in PHP for a long time. And most of the big projects I've worked on for a long time did not start out using frameworks. And they did not start out using libraries. They look a bit like that today, but their core, they are custom. There may be a framework buried in there. But there is so much code that the framework is a component and is not the main deal. Most code, we actually do write ourselves, because that's what we're paid to do. I think we don't decide how people are going to use it, and we don't decide where they're going to use it. The fact is, like Craig said, it's a guardrail that you can work around easily. And if you find a use case for doing that, then we shouldn't prejudge, and say, well, that's the wrong thing to do. It might not be the wrong thing to do. For example, an earlier version of the idea included support for integers. We considered integers safe, regardless of their source. If you wanted to do that, in your application, you could do that very easily and still retain the integrity of the guardrail is not compromised. I wouldn't focus on this is for libraries, and this is for frameworks, because these things become so small in the scheme of things that they're meaningless. I mean, most of the code we work on is code that we wrote, it is not frameworks.

Derick Rethans 11:48

That also nicely answers my next question, which is what's happened to integers, which have now nicely covered. The RFC talks about that as hard to educate people to do the right thing. And that is_literal is more focused, so to say, on libraries, and perhaps query building frameworks as the RFC alludes to. But I would say that most of these query building tools or libraries already deal with escaping from input value. So why would it make sense for them to start using is_literal if you're handling most of these cases already anyway?

Craig Francis 12:24

If you look at the intro of the RFC, there's a link to show examples of how libraries currently receive the strings. And you're right about the Query Builder approach is a risky thing, I would still argue it's an important part. That's why libraries still provide them. Doctrine has a nice example of DQL. The doctrine query language is an abstraction that they've created, which is also vulnerable to injection vulnerabilities. And it gives the developer a lot more control over a very basic API. I still think people should try and use the higher level API's because they do provide a nice safe default, but that depends on which library use, they're not always safe by default. So for example, when you're sort of saying: I want to find all records where field parameter one, is equal to value two, a lot of the libraries assumed that the first parameter there is safe and written by the developer. They can't just necessarily simply escape it as though it's a field because that value might be something like date, bracket, field, bracket, and it's sort of relying on the developer to write that correctly, and not make any mistakes. And that hasn't proven to be the case, you know, they do include user values in there.

Derick Rethans 13:43

Just going back a little bit about some of the feedback, because feedback to the RFC has happened for quite some time now. And there were lots of different approaches first tried as well, and suggested to add additional functions and stuff like that. So what's been the major pushback to this latest iteration of the RFC?

Joe Watkins 14:01

So I think the most pushback has come from an earlier suggestion that we could allow integers to be concatenated and considered literal. We experimented with that, and it is possible, but in order to make it possible, you have to disable an optimization in the engine, that would not be an acceptable implementation detail for Dmitri. It turns out we didn't actually, we don't need to track their source technically, but it made people extremely uncomfortable when we said that, and even when we got an independent security expert to comment on the RFC, and he tried to explain that it was no problem, but it was just not accepted by the general public. I'm not sure why.

Derick Rethans 14:45

All right. Do you have anything to add Craig?

Craig Francis 14:48

The explanation given by people is they liked the simpler definition of what that was as if it's a string written by the developer. Once you start introducing integers from any source, while it is safe, it made people feel, yeah, what is this. And that's where we also had the slight issue because we had to find a new name for it. And I did the silly thing of sort of asking for suggestions, and then bringing up a vote. And then we had, I think it's 18 to three people saying that it should be called is_trusted, and you have that sinking moment of going, Oh, this is going to cause problems, but hey, democracy. It creates that illusion that it's something more. So that's why we sort of went actually, while I like Scott's idea of having the idea of maybe calling it is_noble. It is a vague concept, which people have to understand. And it's a bit strange. Whereas going back to the simpler, original example, they've all seem to grasp grasp of that one. And we could just keep with the original name of is_literal, which I've not heard any real complaints about.

Derick Rethans 15:53

I think some people were equivalenting is_trusted with something that we've had before in PHP called Safe mode, which was anything but of course.

Craig Francis 16:02

Yes, no, definitely.

Derick Rethans 16:03

We're sort of coming to the end of what to chat about here. Does the introduction of is literal introduce any BC breaks?

Craig Francis 16:11

Only if the user land version of is_literal, which I'm fairly sure is going to be unlikely. So on dividing their own function called that.

Derick Rethans 16:18

Did you check for it?

Craig Francis 16:20

Yes.

Derick Rethans 16:21

So if you haven't found it, then it's unlikely to to exist.

Craig Francis 16:24

There are still private repositories, we can't shop through all their show, check through all their code. But yeah.

Derick Rethans 16:29

Did I miss anything?

Craig Francis 16:31

We covered future scope, which is the potential for a first class type, which I think would be useful for IDs and static analysers. But this is very much a secondary discussion, because that could build on things like intersection types, but we still need to focus on what the flag does. And there's also possibility of using this with the native functions themselves, but we do have to be careful with that one, because, you know, we got things like PHPMyAdmin. We have to be able to make the output from libraries as trusted because they're unlikely to still be providing a literal string at the end of it. So that's a discussion for the future. And the only other thing is that, you know, the vote ends on the 19th of July.

Derick Rethans 17:08

Which is the upcoming Monday. How is the vote going? Are you confident that it will pass?

Craig Francis 17:13

Not at the moment, we're sort of trying to talk to the people who voted against it. And we've not actually had any complaints as such. The only person who sort of mentioned anything was saying that we should rely on documentation and the documentation is already there. And it's not working. I think a lot of people just voted no, because they just sort of going well, that's the safe default. I don't think it's necessary. Or, you know, I'd like the status quo. And we still are trying to sell the idea and say: Look, it's really simple. It's not really having a performance impact. And it can really help libraries solve a problem, which is actually happening.

Derick Rethans 17:46

Is this something that came out of the people that write PHP libraries or something that you came up with?

Craig Francis 17:52

So I've come gone to the library authors and suggested you know, this is how Google do it. Would you like something similar? And we've certainly had red bean and Propel ORM saw show positive support for that. And I've also talked to Matthew Brown, who works on the Psalm static checking analysis. He's very positive about it, so much so that Psalm now also includes this as well. Obviously, static analysis is not going to be used by everyone. So we would like to bring this back to PHP so that libraries can use it without relying on all developers using static analysis.

Derick Rethans 18:25

Thank you very much. Glad that you were both here to explain what this is_literal RFC is about.

Craig Francis 18:31

Thank you very much, Derick.

Joe Watkins 18:33

Thanks for having us.

Derick Rethans 18:37

Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time.


PHP Internals News: Episode 90: Read Only Properties

PHP Internals News: Episode 90: Read Only Properties

In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the "Read Only Properties" RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:14

Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is Episode 90. Today I'm talking with Nikita Popov about the read only properties version two RFC that he's proposing. Nikita, would you please introduce yourself?

Nikita Popov 0:33

Hi, Derick. I'm Nikita and I do PHP core development work by JetBrains.

Derick Rethans 0:39

What does this RFC proposing?

Nikita Popov 0:41

This RFC is proposing read only properties, which means that the property can only be initialized once and then not changed afterwards. Again, the idea here is that since PHP 7.4, we have typed properties. A remaining problem with them is that people are not confident making public type properties because they still ensure that the type is correct, but they might not be upholding other invariants. For example, if you have some, like additional checks in your constructor, that string property is actually a non empty string property, then you might not want to make it public because then it could be modified to an empty value for example. One nowadays fairly common case is where properties are actually only initialized in the constructor and not changed afterwards any more. So I think this kind of mutable object pattern is becoming more and more popular in PHP.

Derick Rethans 1:35

You mean the immutable object?

Nikita Popov 1:37

Sorry, immutable. And read only properties address that case. So you can simply put a public read only typed property in your class, and then it can be initialized once in the constructor and you can be... You don't have to be afraid that someone outside the class is going to modify it afterwards. That's the basic premise of this RFC.

Derick Rethans 1:57

But it also means that objects of the class itself can modify that value any more, either.

Nikita Popov 2:01

Exactly. So that's, I think, a primary distinction we have to make. Genuinely, there are two ways to make this read only concept work. One is like actually read only or maybe more precisely init once, which is what this RFC proposes. We can only set that once and then even in the same class, you can't modify it again. And the alternative is the asymmetric visibility approach where you say that, okay, only in the public scope, the property can only be read, but in the private scope, you can modify it. I think the distinction there is very important, because read only property tells you that it's genuinely read only, like, if you access a property multiple times in sequence, you will always get back the same value. While the asymmetric visibility only says that the public interface is read only, but internally, it could be mutated. And that might like be, you know, intentional, just that you want to like have your state management private, but that the property is not supposed to be immutable.

Derick Rethans 3:05

How's this RFC different from read only properties, version one?

Nikita Popov 3:09

Read only properties version one was called write once properties. I think the naming is kind of one of the more important differences. The new RFC is also effectively write once, but I think it's really important to view it from an API perspective as read only because that's what the user gets to see. While write once gives you this impression, that is know that you can externally from outside the class like passing the value once I know like dependency injection, that is what they would think of when they hear write ones. And from the technical site, there is a related difference. And that difference is that new RFC only allows you to initialize read only properties inside the class scope. That means if you do something really weird, like leaving a property uninitialized in the constructor, it's not possible for someone outside the class to initialize it instead, just like an extra safety check. Of course, you can, as usual bypass that, like if you're writing a serializer or hydrator, you can use reflection to initialize it outside the class. But normally you won't be able to.

Derick Rethans 4:13

Does that mean that these read only properties can also be initialized from a normal method instead of just from the constructor?

Nikita Popov 4:19

That's true. Yes, that's possible.

Derick Rethans 4:21

So the RFC talks about that a read only property cannot be assigned from outside a class. Does that mean it can be set by a different object of the same class?

Nikita Popov 4:29

Yeah, that's how scoping works in PHP. Scoping is always class based, not object based, it's a common misconception that if you have like private scope, you can access different objects of the same class.

Derick Rethans 4:42

That was a surprise to me the first time I ran into that, but once you know, it's obvious that it's should work all over the place right. Now, the RFC states that you can only use read only with typed properties. Why is that?

Nikita Popov 4:55

So this is related to the initialisation concept, that typed introduced. Typed properties start out, if you don't give them a default value, they start out in an uninitialized state. And we reuse that state for read only properties. You can only assign to the property while it's uninitialized. And once it's initialized, you cannot assigned to it any more or even unset it back to an uninitialised state. For non typed properties, you also can get into this uninitialized state by explicitly unsetting it. But the problem is that this is not the default state. Untyped properties always have no default value, even if you don't specify one, which effectively means that these properties are always initialized. So they will be kind of useless if you used read only with them. Which is why we make this distinction to avoid any confusion. And if you want to use an untyped read only property, you do that by using the mixed type, which is the same but has the initialisation semantics of typed properties.

Derick Rethans 5:56

What would that mean if you have say, a resource or class typed property with a read only keyword? Can you not read or write to a resource any more? Or modify properties on an object, that is the value of a read only typed property?

Nikita Popov 6:12

No, you can still modify those, because we have to distinguish the concepts of like exterior and interior mutability here. So objects and resources are... Well, I mean, we often say they are passed by reference, which is not strictly true, because those are not PHP references. But the important part is that they only pass around some kind of handle and you can still modify the inside of that handle. What you can't do is you can't reassign to a different resource or reassign to a different object, it's always the same object, the same resource, but the insides of the objects, those can change. Of course, if your object also only contains read only properties, then you won't be able to change this.

Derick Rethans 6:55

Okay, and that answered another question that I have: how it possible to make the whole object read only or on all of its properties? Where the answer is by setting all the properties to read only.

Nikita Popov 7:05

We could like add a read only class modifier that makes all the properties implicitly read only but maybe future scope.

Derick Rethans 7:13

Is there a reason why read only properties can't have a default value?

Nikita Popov 7:18

So this is, again, same issue with initialization. If they have a default value, then they're already initialized. So you can't overwrite it. Like we could allow it, but you just would never be able to change them from the default value, which is something we could allow it just wouldn't be very useful.

Derick Rethans 7:36

in PHP, eight zero PHP introduced promoted or constructor promoted properties, I think it's the full name of it. How does this read only property tie in with that? Because can you set the read only flag on a constructor promoted property?

Nikita Popov 7:49

Yeah, you can. So that works as expected. Important bit there is that with the promoted properties, you can set the default value. And the reason is that for promoted properties, the default value is not the default value of the property. It's the default value of the parameter. And that works just fine.

Derick Rethans 8:07

But again, it would be a constant.

Nikita Popov 8:10

No, in that case, it's not a constant, because it's just a default value for the parameter and the code we generate, we just assign this parameter to the property IN the constructor.

Derick Rethans 8:19

Which means that if you instantiate the class with different arguments, they would of course, override the default value of the constructor argument value, right?

Nikita Popov 8:28

That's right. If you pass it explicitly, then we use the explicit value. If you don't pass an argument, then we use the default of the argument, but it still gets assigned to the property through the like automatically generated code for the promotion.

Derick Rethans 8:42

Promotion isn't actually... there isn't actually really a language feature, but more of a copy and paste mechanism.

Nikita Popov 8:50

Yes, this is pure bit of syntax sugar.

Derick Rethans 8:53

Which sometimes can be handy. But all kinds of interesting things that we added to properties or type system in general, usually inheritance comes into play. Are there any issues here with read only and inheritance? What does it do to variants or traits or things like that, all the usual things that we need to take into consideration?

Nikita Popov 9:13

Most of our rules for properties, most of our inheritance rules for properties are invariant, which means that the property in the child class has to basically look the same as the property in the parent class. And this is also true for read only properties. So we say that if the parent property is read only the child property has to be read only, and the same the other direction and for the type as well. So we say that the type of the read only property has to match with the parent property. Those rules are very conservative and like they could theoretically maybe be relaxed in some cases. For read only properties on read only is like kind of return type and return types in PHP are covariant. So one could argue that the type should also be covariant here The problem is that like here, we get into this little detail that read only is really not read only, but init once. So there is this one assignment. And assignment is more like an argument. So is contravariant. So if we made them covariant, then we would basically lie about this one initializing assignment. And we don't like to lie about these things, at least without some further consideration.

Derick Rethans 10:24

Read only doesn't change the variance rules at all, which is different from the property accessors RFC we spoke about a couple of months ago. Talking about that this is actually a competing RFC, or do they tie together?

Nikita Popov 10:37

Well, I should first say that probably the variance stuff defined in the accessors RFC maybe isn't right, for the same reason and should also like just keep things invariant. But it's more complicated there because it also has abstract properties, or abstract accessors. And then the abstract case, that's where the different variance rules are safe. To get back to your question, they are kind of competing, but could also be seen as just complimentary. So read only is basically a small subset of the accessors feature. I mean, accessors also allow you to implement read only properties as part of a much more general framework. And read only properties cover like only this small but very useful corner in a way that's much simpler. And I think that's not just simpler from a technical perspective, but also simpler to understand for the programmer, because there are a lot of less edge cases involved.

Derick Rethans 11:32

Because we're getting pretty close to feature freeze now. Are you still intending to take the property accessors RFC forward for PHP eight one?

Nikita Popov 11:41

No, definitely not.

Derick Rethans 11:43

But you have taken the read only properties one forwards because we're already voting on it. At the moment, it looks like the vote is going to pass quite easily. So there's that. We don't have to speculate about that, like we had to do with previous RFCs. Do you think I've missed anything talking about read only properties? Or have we covered everything?

Nikita Popov 12:01

We've missed one important bit. And that's the cloning. So the read only properties RFC is not entirely uncontroversial. And basically all the controversy is related to cloning. The problem are wither methods as they're used by PSR seven, for example, which are commonly implemented by cloning the original object, and then modifying one property in it. This doesn't work with read only properties, because you know, if you clone the object, then it already has all the properties initialized. So if you try to change something, then you'll get an error that you're modifying read only property. So these patterns are pretty fundamentally incompatible. There are possible solutions to the problem primarily a dedicated syntax that goes into the code name of cloneWith where you clone an object, but override certain properties directly as part of the syntax, which could bypass the read only property checks. The contention here is basically there are some people consider this clone support and these wither methods are very important. Well, I mean, read only properties are about immutable objects. And like PSR seven is a fairly popular read only object pattern, they think that without support for cloning or for withers, the feature loses all value. That's why the alternative suggestions are either to first introduce some more cloning functionality. So like this mentioned, cloneWith or instead implement asymmetric visibility, which does not have this problem because inside the class, you can always modify the property. So it works perfectly fine with cloning. My personal view on this is well I personally use cloning in PHP approximately never. So this is not a big loss for me, and I'm happy for this from my perspective, edge case, to be addressed at a later time. I can understand that other people put more value on this then I do.

Derick Rethans 13:55

There doesn't seem to be enough push against that for this RFC to fail. So there is that. Thanks Nikita for explaining the read only properties RFC which will very likely see in PHP eight one. Thanks for taking the time.

Nikita Popov 14:08

Thanks for having me Derick, once again.

Derick Rethans 14:13

Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool, you can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time.


PHP Internals News: Episode 89: Partial Function Applications

PHP Internals News: Episode 89: Partial Function Applications

In this episode of "PHP Internals News" I chat with Larry Garfield (Twitter) and Joe Watkins (Twitter, GitHub, Blog about the "Partial Function Applications" RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:14

Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is Episode 89. Today I'm talking with Larry Garfield and Joe Watkins about a partial function application RFC that they're proposing with Paul Crevela and Levi Morrison. Larry, would you please introduce yourself?

Larry Garfield 0:36

Hello World. I'm Larry Garfield or Crell on most social medias. I'm a staff engineer for Typo3 the CMS. And I've been getting more involved in internals these days, mostly as a general nudge and project manager.

Derick Rethans 0:52

And hello, Joe, would you please introduce yourself as well?

Joe Watkins 0:55

Hi, I'm Joe, or Krakjoe, I do various PHP stuff. That's all there is to say about that really.

Derick Rethans 1:02

I think you do quite a bit more than just a little bit. In any case, I think for this RFC, you, you wrote the implementation of it, whereas Larry, as he said, did some of the project management, I'm sure there's more to it than I've just paraphrased in a single sentence. But can one of you explain in one sentence, or if you must, maybe two or three, what partial function applications, or I hope for short, partials are?

Larry Garfield 1:27

Partial function application, in the broadest sense, is taking a function that has some number of parameters, and making a new function that pre fills some of those parameters. So if you have a function that takes four parameters, or four arguments, you can produce a new function that takes two arguments. And those other two you've already provided a value for in advance.

Derick Rethans 1:54

Okay, I feel we'll get into the details in a moment. But what are its main benefits of doing this? What would you use this for?

Larry Garfield 2:01

Oh, there's a couple of places that you can use partial application. It is what got me interested. It's very common in functional programming. But it's also really helpful when you want to, you have a function that like, let's say, string replace takes three arguments, two of which are instructions for what to replace, and one of which is the thing in which you want to replace. If you want to reuse that a bunch of times, you could build an object and pass in constructor values and save those and then call a function. Or you can just partially apply string replace with the things to search for, and the things to replace with and get back a function that takes one argument and will do that replacement on it. And you can then reuse that over and over again. There are a lot of cases like that, usually use in combination with functions that wants a callback. And that callback takes one argument. So array map or array filter are cases where very often you want to give it a function that takes one argument, you have a function that takes three arguments, you want to fill in those first ones first, and then pass the result that only takes one argument to array map or a filter, or whatever. So that's the one of the common use cases for it.

Derick Rethans 3:15

That's the benefits and some of its background comes from functional programming, as you've just mentioned. What is the syntax that you're proposing and some of the semantics?

Larry Garfield 3:26

The syntax that we've developed, are two placeholders that you can use in a function call. So if you're calling a function as you normally would, but for one of the arguments, you pass a question mark, or at the tail end, you have an ellipsis (dot dot dot), then that tells the engine: This is not a function call. This is a partial application. And what it will do is return not the result of the function but return a closure object that has the the arguments that correspond to those question marks. And then when called with those arguments, we'll pass those along with the original function. Probably easier to explain, if I use a concrete example, using the string replace example we talked about before, you would call it with str_replace, the example from the RFC, hello, hi, question mark. What that gives you is a callable, a closure that has one argument, which will take its type and name from str_replace. So the third argument to str_replace essentially gets copied into that closure. And what closure does internally when you call it with that one argument is it just calls string replace with hello, hi, and whatever argument you gave it and returns that value. It is conceptually very, very similar to just writing a short lambda or an arrow function that takes one arguments and calls string replace hello, hi, and that argument. In most cases, it ends up functioning almost exactly like that. There's a few subtle differences in a few places. But most of the time, you can think of it working essentially like that. The question mark means one required argument only. The dot dot dot means zero or more arguments, if you want to, say provide the first argument to a function, and then dot dot dot would mean: And then all of the other arguments, however many there are, even if it's that zero, those are what's left, which languages other languages that have partial application as a first class feature, usually end up doing it that way where you can only pre fill from the left. PHP, because the placeholder lets us do it in any order. So we can skip over arguments if we want to, which is quite nice. But it means that you can take a function and reduce it to, I want to prefill just these two arguments and leave these three arguments for the new function, or I want to prefill these arguments from the left, and then everything else, whatever it is, is left. It also lets you do cute things like if you provide all of the arguments to a function, and then just tack on a dot dot dot the end of it, then you get back a closure that takes essentially zero arguments. But when called, will call that other function. So it's lets lets you really easily build a delayed function as you need to.

Derick Rethans 6:15

When do the arguments to the function get evaluated then?

Larry Garfield 6:18

Arguments are evaluated in advance. So this is the subtle difference between partial application and the short lambda syntax. In a short lambda, what happens is, essentially, that entire expression on the right hand side gets wrapped up into a closure. And so any arguments that are compound like they have a function call that is inside one of the placeholders, or one of the arguments, that'll get evaluated later. With partial application, the function that is in a parameter position gets evaluated first and reduced to a value. And that value gets partially applied to the function. 90% of the time, that's not going to be an issue. There are a few cases where doing it one way or the other may be subtly different, but you'll spot those fairly easily.

Derick Rethans 7:02

So the RFC talks about things that you can do, but also a few things that you cannot do or don't want to do yet. What are these things that partials won't support, or run support yet, at least?

Larry Garfield 7:13

The main thing that it doesn't support is named placeholders. You can pre fill a value or an argument with a named named argument. But not a named placeholder. Those have to be positional. Named placeholders are complicated to implement, and run into a question of, if you provide those in a different order, does that also change the order of the arguments in the partially applied function that you get back in that closure? And there's a good argument to be made that either way is logical. And so we're like, no, does not deal with it, too complicated. We'll just positional only. And you cannot specify an optional arguments either. It's just again, too complicated. Things get too weird. If you have those advanced cases, use our short lambda, that works just fine. If you want to just make a new function that defers to a new function, and change its API in the process, short lambda works fine. And it's still quite short.

Derick Rethans 8:13

I know the RFC talks a little bit about references, but I don't like talking about references. So let's skip that part. In my opinion, they should be removed from the language. But I know we can't.

Larry Garfield 8:22

There's occasionally used for them. But very occasionally.

Derick Rethans 8:25

There's a bunch of technical things that I also want to chat about. And hopefully, Joe, if you want to fill in, I'd be more than welcome to hear your opinions on these things. But the first one is that PHP has this thing called func_get_args. How does that work with these partials? How does that tie in together?

Joe Watkins 8:42

It should mostly behave as if you've invoked the function directly. We don't want there to be a huge discrepancy between. The callee know whether they've been called through partial application or complete application. It should be the same.

Derick Rethans 8:58

That is good to know. I mean, I always like it how things work as people expect them to work, right?

Joe Watkins 9:03

Yeah.

Derick Rethans 9:04

We already have used the dot dot dot operator for variadics. But you're reusing the dot dot dot, or ellipses, as you more eloquently call it earlier. Here again, as well, is that not going to cause issues? Or does that tie in well together?

Joe Watkins 9:18

Well, there's quite a lot of debate about what's the right symbol to use. I think it's dot dot dot, and I think Larry agrees with me. But there's some people who want to stick an extra question mark on the end, which to me looks like it reads zero to one. And to Larry, it looks like an extra character that's just not needed. Other people say it makes sense for them. But if you can type three characters and not four, I mean, you need a really good argument. The arguments that have been put forward so far don't really make very much sense for me. Maybe we should ask that question and it doesn't really matter. In the end, what the syntax is, is if it's a difference between it getting in and not getting in, then we'll just put the extra question mark on there. I don't really have a really good argument to change it like to be like that.

Derick Rethans 10:05

To be honest, to me, it looks like you then have two placeholders.

Joe Watkins 10:09

Yeah.

Derick Rethans 10:10

I don't feel the need for it.

Joe Watkins 10:11

That's also another argument because we've introduced this one symbol, and then this other symbol, and then you put them together. And that's two things. I mean, you can't have one and one equals one.

Derick Rethans 10:20

Fair enough. The RFC does touch on another quite interesting thing, I think, which is constructors, which it also be able to partially apply. But of course, you've mentioned that, that arguments get applied immediately when you do the substitution, when you do the partial application. But of course, the constructor is a bit weird because a constructor runs immediately after an object has been constructed. So how does that work together with partials?

Joe Watkins 10:47

So at first, we made it so like if you invoke a constructor with reflection, and you just invoke it over and over again, it'll invoke it on the same object, you won't get back a new object. It's not the constructor that returns the object, it's the new operator. So first, we had a bit dumb. And we did just like what reflection does. And if you applied to a constructor, you'd get back a closure that just repeatedly invokes the constructor, which is, as Larry called it, quite naive. So we went back and revisited that. And so now it acts like a factory. Every time you invoke the closure return from an application, you get a brand new object, which is more in line with what people expect. And it's also quite cool. It's one of my favourite bits, actually as it turns out.

Derick Rethans 11:31

In my opinion, it also makes more sense than then having an apply to the same object over and over again. Whether I'd like it or not, I don't know yet.

Joe Watkins 11:39

Oh, the other option is traditional constructors to avoid the surprising behaviour. But that would be just a strange.

Larry Garfield 11:45

There are a lot of use cases where you want to take a bunch of values, convert them to objects using an array map, supporting constructors for that makes total sense to me.

Derick Rethans 11:54

And I would probably say, though, that I would prefer not allowing it over it applying over the same object over again. You've touched a little bit on some common cases where you want to use this, do you perhaps have some other ideas where this might be really useful?

Larry Garfield 12:10

So there's three use cases that we think are probably going to be the lion's share. One is to just use the dot dot dot operator. So you have some function or method call, call it with dot dot dot, and that's it. You prefill nothing, which gives you back a closure that is identical in signature to the function or the method that you're applying it to. Everything we've said about functions applies the methods here as well. Which means we now effectively have a new way to refer to a function or a method and make a callable out of it, that doesn't involve just sticking it into a string. You just say, hey, function called dot dot dot, or an arrow bar, parentheses, dot dot dot, parentheses. And now you can turn any function or method into a callable and pass that around. And it's still, it's not wrapped up into the silly array format, it's still accessible to static analysers and refactoring tools. Hopefully, with this, you will never need to refer to a function name using a string ever again, never refer to a method call as an array of object and method. So that that just is not needed any more in the vast majority of cases.

Derick Rethans 13:20

That alone is probably worth having them, maybe.

Larry Garfield 13:23

And Nikita had an RFC that was doing just that, and nothing else. It's kind of a junior version of this. I don't think that's necessary, the full full scope here works, and gives us that. The second use case that I think is going to be common are unary functions. That's functions that take a single argument. More to the point, as I mentioned before, a lot of functions take a callback. And that callback needs a single argument, array map, array filter, some validation routines, a lot of other things like that. So it's now stupidly easy to take any arbitrary function or method and turn it into a single parameter function, which you can then pass as a callback to array map, array filter, all these other tools, and it just becomes really easy to pre fill things that way. The third is the other one I mentioned earlier, if you pre fill all the arguments, and then just put a dot dot dot at the very end, which means zero or more, you now have a function that takes no arguments, but calls the original function you specified with all the arguments you specified. This often the case for default values, where I want to have a default value available, but don't want to take the time to compute it in advance because it might be expensive. Whatever function it is that will determine that default value, I just partially apply that and give it all the arguments and I get back a callable. That creating a callable is dirt cheap, but when I actually need that value, I can then call it at that time, but it won't actually get called unless I need it. That's another use case that we expect to be common. There are no doubt others that we haven't thought of, or that will be less common, but still useful. I think this will probably replace a large chunk of the use cases for short lambdas. Not because short lambdas are bad, they're wonderful. But so many of them convert a function to a simpler function. And this gives us an even more compact, more readable syntax for that, with even less extra symbols and flotsam around it.

Derick Rethans 15:24

I saw, hopefully as a joke, saying that, instead of using the question mark, we should use dollar sign dollar sign, and then we should call the token name T_BLING.

Larry Garfield 15:36

This RFC actually has a storied history. Several years ago, Sara Golemon had proposed porting the pipe operator from Hack to PHP. The pipe operator is an operator available in a lot of different languages that lets you string together a series of functions. So you pass a function, pass an argument into one function, its results you pass to the next function, its results, you pass the next function and so on, which is a good case for unary functions. In Hack's syntax, they don't use a function on the right hand side, they use an arbitrary expression, and then dollar dollar as a placeholder for where to put the value from the left hand side from the previous step. It's the only language that does that.

Derick Rethans 16:20

The other language that does it is bison.

Larry Garfield 16:23

Or Bison also does that style of?

Derick Rethans 16:25

It does something weird like that, yeah. Have a look at the grammar file.

Larry Garfield 16:29

I've looked in there. It's scary. So at the time, she didn't actually put an implementation in for it. But there was some discussion about it. I joked that if she wanted to do that, she should call it T_BLING. And she thought it was hilarious, but never went anywhere. A year ago, I started working on a pipe operator RFC that did just the pipe part, but used a callable on the right hand side, instead of an expression, more like F#, and Haskell, and other languages that have a pipe operator. And their main response to that was, we'd like this, this is cool, except that just using short lambdas on the right all the time to make unaries is too ugly. We want partial application first. So I spent a while trying to bribe someone with more experience and knowledge than me to work on partial application. I tried bribing Ilya Tovolo, to do so by working with him on enumerations. And we got enumerations in, but he doesn't have the time to work on partial application. Levi and Paul had already written an RFC for partial application that had no implementation. It's just a skunkworks, essentially. Then a few weeks ago, Joe pops up and starts working on an implementation for partials. And I, to this day, don't know what interested him in it. But I'm very happy about this fact. So as we updated the RFC, I knew that people want a bike shed about syntax. So I threw that in as a joke. I don't think we're actually going to do that. It's just a little inside reference that is now no longer inside.

Derick Rethans 17:56

Joe what made you work on partials, then?

Joe Watkins 17:58

It's interesting to write. I've had my fun whether it gets in or not.

Derick Rethans 18:02

Sometimes that's the case, right? So just working on this is all the fun.

Larry Garfield 18:06

Sometimes it's fun to just run down rabbit holes for the heck of it. And sometimes really cool things can come out of that sometimes.

Derick Rethans 18:12

At some point, I might have to implement support for partials like I have for closures in Xdebug as well. Because at some point, people might want to debug these things. So I'm a little bit interested in how do these the closures that it generates? Where does it store the already applied arguments?

Joe Watkins 18:29

So partials have the same binary struct up to this point or of the closure, and then after that there's some extra fields.

Derick Rethans 18:36

Would they still have the names?

Joe Watkins 18:39

No, because named arguments aren't actually named, that information is lost. By the time we've got them, we don't have any name information. We've only got their correct position, according to the call that was made.

Derick Rethans 18:50

And every argument that hasn't been filled and doesn't have a special placeholder in there, or does it keep track of which ones have been filled in?

Joe Watkins 18:56

We've got two special placeholders internally, you won't see as undef or null or anything.

Derick Rethans 19:02

Okay, that's good to know. What has the reaction been so far?

Larry Garfield 19:05

Slightly positive. There were a lot of discussions early on about do we support argument reordering? And should it use a single placeholder or two separate placeholders? Originally, we had one and realized after a while, that doesn't actually work. There're use cases where that will be confusing. Overall, the feedback has been quite positive, and I fully expect that to pass. Really the only question people are still debating about at this point is ellipsis versus ellipses question mark.

Joe Watkins 19:34

Yeah, I think the first version of the RFC was quite well received. Someone said we could document it as to make a partial sprinkle or question mark over it and hope for the best.

Derick Rethans 19:44

Oh, that's good to hear. With feature freeze coming not pretty soon now. When do you think you're putting this up for a vote?

Larry Garfield 19:51

Probably in the next couple of days. The only question I think is whether we include a second question for which variadic placeholder to use, which syntax/ Or if we just say it's dot dot dot, go away. Other than that it should go to a vote probably before this episode airs.

Derick Rethans 20:06

Thank you very much, both of you for taking the time to me today to talk about partials.

Larry Garfield 20:11

Thank you again Derick, hopefully see you once more on this season.

Joe Watkins 20:15

Thanks Derick, see you soon.

Derick Rethans 20:21

Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next time.


PHP Internals News: Episode 88: Pure Intersection Types

PHP Internals News: Episode 88: Pure Intersection Types

In this episode of "PHP Internals News" I talk with George Peter Banyard (Website, Twitter, GitHub, GitLab) about the "Pure Intersection Types" RFC that he has proposed.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:14

Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is Episode 88. Today I'm talking with George Peter Banyard about pure intersection types. George, could you please introduce yourself?

George Peter Banyard 0:30

Hello, my name is George Peter Banyard. I work on PHP code development in my free time. And on the PHP Docs.

Derick Rethans 0:36

This RFC is about intersection types. What are intersection types?

George Peter Banyard 0:40

I think the easiest way to explain intersection types is to use something which we already have, which are union types. So union types tells you I want X or Y, whereas intersection types tell you that I want X and Y to be true at the same time. The easiest example I can come up with is a traversable that you want to be countable as well. So traversable and countable. Currently, you can do intersection types in very hacky ways. So you can either create a new interface which extends both traversable and countable, but then all the classes that you want to be using this fashion, you need to make them implement the interface, which might not be possible if you using a library or other things like that. The other very hacky way of doing it is using reference and typed properties. You assign two typed properties by reference, one being traversable, one being countable, and then your actual property, you type alias reference it, with both of these properties. And then my PHP will check: does the property respect type A those reference? If yes, move to the next one. It doesn't respect type B, which basically gives you intersection types.

Derick Rethans 1:44

Yeah, I saw that in the RFC. And I was wondering like, well, people actually do that?

George Peter Banyard 1:49

The only reason I know that is because of Nikita's slide.

Derick Rethans 1:51

The thing is, if it is possible, people will do it, right. And that's how that works.

George Peter Banyard 1:56

Yeah, most of the times.

Derick Rethans 1:57

The RFC isn't actually called intersection types. It's called pure intersection types. What does the word pure do here?

George Peter Banyard 2:05

So the word pure here is not very semantic. But it's more that you cannot mix union types and intersection types together. The reasons for it are mostly technical. One reason is how do you mix and match intersection types and union types? One way is to have like union types take precedence over intersection types, but some people don't like that and want to explicit it grouping all the time. So you need to do parentheses, A intersection B, close parentheses, pipe for the union, and then the other type. But I think the main reason is mostly the variance, like the variance checks for inheritance are already kind of complicated and kind of mind boggling.

Derick Rethans 2:44

I'm sure we'll get into the variance rules in a moment. What is it actually what you're proposing to add here. What is the syntax, for example?

George Peter Banyard 2:52

So the syntax is any class type with an ampersand, and any other class type gives you an intersection type, which is the usual way of doing and.

Derick Rethans 3:01

When you say class types, do you also mean interfaces?

George Peter Banyard 3:04

Yes, PHP has a concept of class types, which are mostly any class in any interface. There's also a weird exception where parent and self are considered class types, but those are not allowed.

Derick Rethans 3:20

Okay, so it's just the classes that you've defined and the class that are part of the language but not a special keywords, self and parent and static, I suppose?

George Peter Banyard 3:28

Yes, the reason for that is standard types are not allowed to be part of an intersection, because nothing can be an integer and a string at the same time. Now, there are some of the built in types, which can be kind of true. You could have a callable, which is a string, because callables can be arrays, or can be a closure. But that's like very weird and not very great. The other one is iterable. If when you expand that out, you get redundant types, which we can talk about later. And the final thing is parent, self, and static, just makes for some very weird design questions, in my opinion, like, if you ask for something to be an intersection with itself, you basically can only enforce conditions on subclasses. You have a class and you say: Oh, I want it to return self, but also be countable for some reason, but I'm not countable. So if you extend me, then you need to be countable, but I'm not. So it's very weird. parent has kind of the very same weird semantics where you can ask a parent, but it's like, if the base class doesn't support it, and you ask for a parent to be an intersection, then you basically need the child to implement the interface and then a child to return the first child. If you do that main question. Why? Because I don't see any good reasons to do it. And it just makes everything harder.

Derick Rethans 4:40

You've only added for the sake of completeness instead of it being useful. Let's move on birds. You've mentioned which types are supported, which is class names and interface names. You already hinted a little bit at redundant types. What are redundant types?

George Peter Banyard 4:56

Currently, PHP already does that with union types. If you repeat the type twice in a union, you'll get a compile error. This only affects compiled time known aliases. If you use a use statement, then PHP knows that you basically using the same type. However you use a runtime alias, then it can't detect that.

Derick Rethans 5:13

A runtime alias, what's that?

George Peter Banyard 5:15

So if you use the function class_alias.

Derick Rethans 5:16

It's new to me!

George Peter Banyard 5:18

it technically exists. It also doesn't guarantee basically that the type is minimal, because it can only see those was in its own file. For example, if you say I want A and B, but B is a child class of A, then the intersection basically resolves to only B. But you can only know that at runtime if classes are defined in different files. So the type isn't minimal. But if you do redundant types, basically, it's a easy way to check if you might be typing a bug.

Derick Rethans 5:46

You try to do your best to warn people about that. But you never know for certain.

George Peter Banyard 5:51

You never know for certain because PHP doesn't compile everything into like one big program like in check. Static analyser can help for that.

Derick Rethans 5:59

Let's talk a little bit about technical aspects, because I recommend that implementing intersection types are quite different from implementing union types. What kind of hacks that you have to make in a parser and compiler for this?

George Peter Banyard 6:11

Our parser has being very weird. The parsing syntax should be the same as union types. So I just copy pasted what Nikita did. I tried it. It worked for return types without an issue. It didn't work with argument types, because bison, which is the tool which generates our parser, was giving a shift reduce conflict, which basically tells: Oh, I got two possible states I can go in, and I don't know which branch I need to go, because the PHP parser only does one look ahead. Because it was conflicting, the ampersand, either for the intersection type or for to mark a reference. Normally, if the paster is more developed, or does more look ahead, it is not a conflict. And it shouldn't be. Ilia managed to came up with this ingenious idea, which is just redefine the ampersand token twice and have very complicated names, and just use them in different contexts. And bison just: now I have no issue. It is the same token, it is the same character. Now that you have two different tokens it manages to disambiguate, like it's shift produce. So that's a very weird.

Derick Rethans 7:17

I'll have a look at what that actually does, because I'm curious now myself. Beyond the parser, I think the biggest and most complicated part of this is implementing the variance rules for these intersection types. Can you give a short summary of what a variance rules are, and potentially how you've actually implemented them?

George Peter Banyard 7:38

Since PHP seven point four, return types and up covariant, and parameter types are contravariant. Covariant means you can like restrict, we can be more specific. And contravariance means you can be broader or like more generic. Union types already gives some interesting covariance implications. Usually, you would think, well, a union is always broader than a single type, you say: Oh, I want either a traversable or accountable, it seems that you're expanding the type sphere. However, a single type can have as a subtype, a union type. For example, you say,:Oh, my base type is a Class A, and I have two child classes, which are B and C. I can type covariantly that I want either B or C, because B or C is more specific than just A. That's what union types over there allows you to do. And the way how it's implemented. And how to check for that is you traverse the list of child types, and check that the child type is an instance of at least one of the parents types. An intersection by virtue of you adding constraints on the type itself will always be more specific than just a single type. If you say: Oh, I want a class A, then more specifically, so I want something of class A and I want it to be countable. So you're already restrict this, which gives some very interesting implications, meaning that a child type can have more types attached to itself than a parent type. That's mostly due how PHP implements its type system, to make the distinctions, basically, I've added the flag, which is either this is a union, meaning that you need to check it is part of one, or it's an intersection. The thing with intersection types is that you need to reverse the order in how you check the types. So you basically need to check that the parent is at least an instance of one of the child types, but not that none of the child types is a super type of the parent type. Let's say you have class C, which extends Class B and Class B extends Class A. If I say let's say my base type is B to any function, and I give something which is a intersection T, any interface, this would not be a valid subtyping relation to underneath B. Because if you looked it was a Venn diagram in some sense, you've got A which is this massive sphere, you've got B which is inside it, and C which is inside it. A intersection something intersects the whole of A with something else, which might also intersect with B in a subset, but it is wider than just B, which means like the whole variance is very complicated in how you check it because you can't really reuse the same loop.

Derick Rethans 10:13

I can't imagine how much more complicated this gets when you have both intersection and union types in the same return type or parameter argument type.

George Peter Banyard 10:22

One of the primary reasons why it's currently not in the RFC, because it is already mind boggling. And although I think it shouldn't be that hard to like, add support for it down the line, because I've already split it mostly up so it should be easy to check: Oh, is this an intersection? Is this a union? And then you need to branch.

Derick Rethans 10:42

Luckily because standard types aren't included here, you also don't really have to think about coercive mode and strict mode for these types. Because that's simply not a thing.

George Peter Banyard 10:50

That's very convenient.

Derick Rethans 10:52

Is the future scope to this RFC?

George Peter Banyard 10:54

The obvious future scope is what I call composite types, is you have unions and intersections available in the same type. The main issue is mostly variance, because it's already complicated, adding more scope to it, it's going to make the variance go even harder. I think with most programming languages, the variance code is always complicated to read. While I was researching some of it, I managed to hit a couple of failures, which where with I think was Julia and the research paper I was it was just like focusing on a specific subset. And like, basically proving that it is correct. It's not a very big field. Professors at Imperial, which I've talked to, have been kind of helpful with giving some pointers. They mostly work with basically proper languages or compiled languages, which have this whole other set of implications. Apparently, they have like a bunch of issues about how you normalize the types like in an economical form, to make it easier to check. Which is probably one of the problems that will need to be addressed, when you get like such a intersection and union type. First, you normalize it to some canonical form, and then you work with it. But then the second issue is like how do you want the composite types to actually be? Is it oh, you have got parentheses when you want to mix and match? Or can you use like union precedence? I've heard both opinions. Basically, some people are very dead against using Union as a precedent.

Derick Rethans 12:14

My question is going to be, is this actually something people would use a lot?

George Peter Banyard 12:21

I don't think it would be used a ton. The moment you want to use it, it is very useful. One example is with the PSRs, the HTTP interfaces. Or if you want the link interface. Combining these multiple things gets it convenient. One of the reasons why I personally wanted as well, it's for streams. So currently, streams don't have any interface, don't have any classes. PHP basically internally checks when you call like certain string methods. For example, if you try to seek and you provide a user stream, it basically checks if you implement a seek method, which should be an interface. But you can't currently do that. Ideally, you would want to stream maybe like a base class, instead of having like a seekable stream, and rewindabe stream, or things like that. You basically just have interfaces. And then like if somebody wants a specific type of stream, just like a stream, which is seekable, which is rewindable. And other things. We already have that in SPL because there's an iterator. And we have a seekable iterator interface, which basically just ask: Oh, this is there's a seek method. I think it depends how you program. So if you separate the many things into interfaces, then you'll probably use intersections types a lot. If you use a maybe a more traditional PHP code base, which uses union types a lot. Union types are like going to be easier. And you want to reduce that.

Derick Rethans 13:32

Would you think that lots of people already use union types because it's pretty new as well. Isn't it?

George Peter Banyard 13:38

Union types are being implemented in various different libraries. PSRs are updating the interfaces to use union types. One use case, I also have a special method, which was taken the date, it takes a union of like a DateTime interface, a string or an integer. Although intersections types are really new, you hear people when union types were being introduced, you heard people saying, I would promote bad cleaning habits, you shouldn't have one specific type. And if you're using a union, you have a design issue. And I had many people complaining to me why and intersection types of see? Why they haven't intersection types being introduced first, because intersection types are more useful. But then you see other people telling us like, I don't see the point in intersection types. Why would you use an intersection type, just use your concrete class, because that's what you're going to type anyway.

Derick Rethans 14:21

I can give you a reason why union types have implemented first, over intersection types, I think, which is that it's easier to implement.

George Peter Banyard 14:28

It's easier to implement. And it's more useful for PHP as a whole, because PHP functions accepts a union or return a union. Functions return false for error states instead of null. It makes sense why union types were introduced first, because they are mostly more useful within the scope of what PHP does.

Derick Rethans 14:46

Do you think you have anything else to add about intersection types? At the moment, it's already up for voting, when is that supposed to end?

George Peter Banyard 14:54

So the vote is meant to end on the 17th of June.

Derick Rethans 14:57

At the moment I see there's 15 votes for and two against so it's looking good. What's been your most pushback on this? If there was any at all?

George Peter Banyard 15:05

Mostly: I don't see the point in it. However, I do think proper reasons why you don't want it, compared to like some other features where it's more like have thoughts on what you think design wise. But it is undeniable that you you add complexity to the variance. And to the variance check. It is already kind of complicated. I have like a hard time reading it initially. There's the whole parser hackery thing, which is kind of not great. It's probably just because we use like a restricted parser because it's faster and more efficient.

Derick Rethans 15:36

I think I spoke with Nikita about parsers some time ago and what the difference between them were. If I remember which episode it was all the to the show notes.

George Peter Banyard 15:44

And I think the last reason against it is that it only accepts pure intersections. You could argue that, well, if you're adding intersections, you should add the whole feature set. It might impact the implementation of type aliases, because if you type alias T to be a union of A and B, and then you use type T in an intersection, you basically get a mixture of unions and intersections, that you need to be able to work with. The crux of this whole feature is the variance implementation. And being able to rationalize the variance implementation and been to extend it, I think it's the hardest bit.

Derick Rethans 16:18

I guess the next thing still missing would be type aliases, right? Like names for types, which you can't define just yet, which I think you also mentioned in the RFC is future scope.

George Peter Banyard 16:29

Yeah.

Derick Rethans 16:30

Thank you, George, for taking the time today to talk to me about pure intersection types.

George Peter Banyard 16:36

Thanks for having me on the show.

Derick Rethans 16:41

Thank you for listening to this installment of PHP internals news, the podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time.


PHP Internals News: Episode 87: Deprecating Ticks

PHP Internals News: Episode 87: Deprecating Ticks

In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the "Deprecating Ticks" RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:14

Hi I'm Derick, welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is episode 87. Today I'm talking with Nikita Popov about a much smaller RFC this time: Deprecating Ticks. Nikita, would you please introduce yourself.

Nikita Popov 0:34

Hi Derick, I'm Nikita, and I'm working on PHP core development on behalf of JetBrains.

Derick Rethans 0:40

Let's jump straight into what this RFC is about, and that's the word ticks. What are ticks?

Nikita Popov 0:46

Ticks are a declare directive,. You write declare ticks equals one at the top of your file, and then PHP we'll call a tick function after every statement execution. Or if you write ticks equals two, then as we'll call it the function after every two statement executions.

Derick Rethans 1:05

Do you have to specify which function that calls?

Nikita Popov 1:08

Of course, so there is also a register tick function and unregister tick function and that's how you specify the function that should be called rather the functions.

Derick Rethans 1:17

How does this work, historically, because the RFC talks about the change being made in PHP seven?

Nikita Popov 1:22

Technically ticks work by introducing an opcode after every statement that calls the tick function depending on current count. The difference that was introduced in PHP seven is to what the tick declaration applies. The way PHP language semantics are supposed to work, is that declare directives are always local. The same way that strict types, only applies to a single file, ticks should also only apply to a single file. Prior to PHP seven, it didn't work out way. So if you had declare ticks, somewhere in your file, it would just enable ticks from that point forward. If you included the different file or even if the autoloader was triggered and included a different file that one would also make use of ticks. That was fixed in PHP seven, so now it is actually file local, but that also means that the ticks functionality at that point behaviour became, like, not very useful. Because usually if you want to use tics you actually want them to apply it to your whole codebase. There are ways around that. I'm afraid to say that people have approached me after this RFC and told me that they actually do that. The way around that is to register a stream wrapper. It's possible in PHP to unregister the file stream wrapper and register your own one, and then it's possible to intercept all the file includes and rewrite the file contents to include the declare ticks at the top of the file. I do use that general mechanism for real things in other places, but apparently people actually use that to like instrument, a whole application with ticks, and essentially restore the behaviour we had in PHP 5.

Derick Rethans 3:03

What was the intended use case for ticks to begin with?

Nikita Popov 3:07

Well I'm not sure what was the intended use case, but at least it was the main use case, and that's signal handling. In the PCNTL extension allows you to register a signal handler, and when the signal arrives, we can't just directly call that signal handler, because signals are only allowed to call functions without that our async signal safe. Which excludes things like memory allocation, and a lot of other things that PHP uses. What we do instead is we only set the flag that okay signal has arrived and then we have to actually run the signal handler at some later point in time. In PHP five, that worked using ticks. You declare ticks, and the PCNTL extension registered the tick handler, and then after this flag was set, it would execute your callback on the next tick. In PHP seven, an attentive mechanism was introduced, that is based on virtual machine interrupts. Those were originally introduced for time-out handling, because there we have a similar problem, that when timeout arrives, we might be in some kind of inconsistent state, like the middle of the allocator right now, and if we just bail out at that point, we are likely to see crashes down the road. So that was a significant problem in PHP five. PHP seven changed that. We now set an interrupt flag on timeout, and then the virtual machine checks this flag at certain points. The interrupt flag is not checked after every instruction, but only, like, just often enough to make sure that it's checked, at some point. So that you can't like go in an infinite loop, that ends up never checking. These points are basically function calls, and jumps that go higher up in the function, PCNTL signals can now use the same mechanism. If you call PCNTL async signals true, then those will also set the interrupt flag, and execute the signal handler on the next opportunity. The next time the interrupt flag is checked. The nice thing about that is that it's essentially free. I mean we already, we already have to do these checks for the interrupt like anyway, adding the handling for PCNTL signals doesn't add any cost on top. Unlike ticks, which have to be like executed on every instruction or at least regularly, and that does add significant cost.

Derick Rethans 5:28

Execution time itself because it's an opcode that needs to be executed.

Nikita Popov 5:32

Exactly.

Derick Rethans 5:33

So what are you proposing to do but the ticks in PHP eight one then?

Nikita Popov 5:36

I want to deprecate that. So both the declared directive itself, and the register tick function, unregister tick function.

Derick Rethans 5:44

How could users emulate the same behaviour as ticks allows them to do so now?

Nikita Popov 5:49

That's a good question. As I mentioned, if the use case is, use case of ticks was signal handling, then by using async symbols. If it was something else, then you have a problem. My assumption when writing this RFC was basically that signal handling was really the main remaining use case of ticks, because other use cases require this kind of you know stream wrapper instrumentation, and I didn't expect that people will be crazy enough to use something like that in production.

Derick Rethans 6:21

Hopefully they catch these rewritten files?

Nikita Popov 6:23

Probably yeah. I think it's possible to make this integrate with opcache. If you use it for other purposes, then, I don't think there is a really good replacement. So I think what they use it for is some kind of well instrumentation, so profiling, memory profiling, for example, and the alternative there of course is to use a tool that is appropriate for that job, for example, Xdebug contains a profiler, but of course it is not a production profiler, but I think there are also production profilers.

Derick Rethans 6:54

As far as I know all the production or APM solutions. They do this on their own without having to use sticks. They don't need any user land modifications.

Nikita Popov 7:03

Yeah, definitely. All the APM solutions support this, they use internal handlers.

Derick Rethans 7:08

Because it's actually removing functionalities that some people use, what's the reaction been to removing this functionality?

Nikita Popov 7:14

Well on the mailing list at least positive, but as I mentioned at least some people have like pointed out on the pull request that they are using the functionality.

Derick Rethans 7:23

Enough in such a way to sway for not deprecating them? What is the benefits of getting rid of ticks, if you don't use them?

Nikita Popov 7:31

That's, I think the thing, that there is not really a big benefit to getting rid of them. Like they don't add a lot of technical complexity to the engine. They're pretty simple in that sense. I haven't seen those responses. I'm kind of rolling a bit unsure if we should really remove them, because you could argue that well they don't really hurt anyone. I do have to say that I think all the things that people use sticks for, all the cases I have heard about, and all of those cases ticks are not the right way to solve the problem. They are not the right way to solve the signal handler problem, they are not the right way to solve the profiling problem. And the other one I heard is also they're not the right way to solve the heartbeat problem, to make sure a service stays connected. While people do use them I think they use them for questionable purposes.

Derick Rethans 8:24

Developers, if they're using something to rewrite the PHP file to introduce ticks, they can also technically rewrite a file to introduce calls to their own functions, after every statement.

Nikita Popov 8:34

Yes, I actually have a very nice PHP fuzzing project that rewrites PHP files to introduce instrumentation functions at certain points. That needs a lot more control than ticks, because it's interested in branching statements in particular. That is definitely also possible, but it's kind of even more crazy than just adding ticks. If you're doing it like this, I think, if we want to keep ticks, then we should change ticks from a declare directive to a ini_set, because this kind of rewriting of files to introduce takes that's like not a great solution. On the other hand, that does mean that if you are, I don't know a library, implementing some code and expecting that, you know, it just runs normally, then someone can with by enabling an ini setting will suddenly run code in the middle of your library file that's like essentially any point. So enabling ticks us a major behaviour change, that's something we really don't like to have in ini settings which is I guess also, why does it declare in the first place, because that limits the scope. And you have to go out of your way if you want to not limit it using this rewriting hack. So I'm not really sure ultimately what to do here.

Derick Rethans 9:44

Are you thinking of bringing this up for vote before PHP eight dot one's feature freeze?

Nikita Popov 9:49

If I decide to go for it, then definitely before. I'm just not completely sure on this topic yet.

Derick Rethans 9:55

it'd be interesting to, to hear what other people think about removing this. I have no opinion about this. Other features I do but in this case, I'm happy with them being there, I'm happy with them not being there, because it's something I'm using myself. In any case, thank you for going through this RFC with me today, and we'll see what happens.

Nikita Popov 10:14

Thanks for having me, Derick.

Derick Rethans 10:18

Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug and debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time.


PHP Internals News: Episode 86: Property Accessors

PHP Internals News: Episode 86: Property Accessors

In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the "Property Accessors" RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:14

Hi I'm Derick. Welcome to PHP internals news, a podcast dedicated to explain the latest developments in the PHP language. This is episode 86. Today I'm talking with Nikita Popov about his massive property excesses RFC. Nikita, would you please introduce yourself?

Nikita Popov 0:32

Hi Derick, I'm Nikita, and I do work on PHP core development, on behalf of JetBrains.

Derick Rethans 0:39

This is probably the largest RFC I've seen in a while. What in one sentence, are you proposing to add to PHP here?

Nikita Popov 0:46

I would say it's an alternative to magic get and set, just for one specific property instead of all of them. That's the technical side. Maybe I should say something about the like motivation behind it, which is that since PHP seven four, we have type properties, that at least for me personally with that feature, the need to have this typical pattern of private property for storage, plus a public getter and setter methods, the main motivation for that has kind of gone away, because we can now use types to enforce any contracts on value. And now these getter and setter methods most if you like boilerplate. So the idea with accessors, at least my idea with accessors is that you really shouldn't use them. You should just have them as a backup option. You declare a public property in your class, and then maybe later, years later, it turns out that okay, that property actually requires additional validation. And right now if you have a public property, then you don't really have a good way of introducing that. Only way is to either break the API contract by converting the property into getter/setter methods where you can introduce arbitrary code, or by using magic get/set, which is definitely possible and persist the API contract, it's just fairly ugly.

Derick Rethans 2:09

You changes the public property that people could read into a private one. And because it's private, the set and get metric methods are being called.

Nikita Popov 2:18

Exactly.

Derick Rethans 2:19

This RFC is titled Property accesses, how do these improve on the situation?

Nikita Popov 2:24

So I think there are really two fairly orthogonal parts to this RFC. The first part is implicit accesses that don't have any custom behaviour, and just allow controlling the behaviour of properties a bit more precisely. In particular, the most important part is probably the asymmetric visibility, where you have a property that's publicly readable, but can only be set from within the class. So public read/ private write. I think that's a, maybe the most common requirement. The second part is where you can actually introduce some custom behaviour. So where you can say that okay, the get behaviour for this property looks like this, and the set behaviour, it looks like this. Which is essentially exactly the same as what magic get/set does, just for a single property.

Derick Rethans 3:10

For example, when you then do set, or you can add additional validation to it.

Nikita Popov 3:14

Exactly. Originally, you had a simple public property, then you can add a setter that checks okay this string cannot be empty.

Derick Rethans 3:23

Okay, what it's the syntax that you're proposing?

Nikita Popov 3:26

I went with these essentially the same syntax that's being used in C#. Looks like you write public foobar, and then you have this sort of semi colon you have a code block. And this code block contains two accessors, so then you have something like get, and another code block that specifies the get behaviour, and set, and the code block that specifies the set behaviour and so on.

Derick Rethans 3:52

The RFC talks about implicit and explicit implementations of these getter and setter accessors. What is the difference between them and how does it look different in syntax?

Nikita Popov 4:03

Yeah so the difference is, either you can write just get semi colon, set semicolon, that's an implicit implementation, or you actually specify a code block with real custom behaviour. To do the implicit implementation, you're saying that this is really a normal property, and PHP automatically manages the storage for you, is that you have this more fine grained control over how it works. Namely what you can do is you can say that you have get and private set. But that's a property that's publicly read only and internally writeable. You can write just get without set, in which case it's a real read only property both publicly and privately, or to be more precise, it's an init once property so you can assign to it once.

Derick Rethans 4:52

How do you keep track of the init once?

Nikita Popov 4:53

It's same mechanism as for Type Properties, where we distinguish between an initialized and an uninitialized property. You can assign to an uninitialized property, but you can't assign to an initialized one, if it's read only. The only maybe problem there is that this mechanism, requires that the property actually is uninitialized to start with, which means that for accessors you don't have any default values. To say there is no implicit default value, no implicit null value. If you want to have a default value the same as with type properties you have to specify it explicitly. Specifying a default value really only makes sense if the property is both readable and writable. For Read Only properties, if you specify the default then you will you can change that.

Derick Rethans 5:37

You have basically have created a constant.

Nikita Popov 5:39

Yes, it is essentially a constant.

Derick Rethans 5:41

You mentioned already, PHP seven four introduced type properties. How do these types interact with the setter and getter accessors?

Nikita Popov 5:50

I would say in the obvious way. The getter is required to return type of property, modulo the usual weak typing conversions, and the setter also checks before it's called whether the passed value matches the type or not. But enforces that matches the type.

Derick Rethans 6:08

This does mean that if you provide an explicit implementation for the set accessor, you also need to specify the parameter name?

Nikita Popov 6:15

No, or you can specify the parameter name, and if you don't then that's just passed in as the value variable. It's also inspired by how C# and Swift do it. I mean there are some possible variations here we could always require an explicit name, some people for that, or I also heard that some people would like to have the name of this implicit variable match the name of the property, instead of always being just value.

Derick Rethans 6:41

Would you have to specify the type though?

Nikita Popov 6:43

You wouldn't have to and you're actually not allowed to. So the accessor implementation is somewhat strict about not allowing you to do anything that would be redundant because otherwise, you know, there are quite a lot of extra things you could be adding everywhere.

Derick Rethans 6:56

That's the same way as marking a property as private. And then the accessors as private as well. Right?

Nikita Popov 7:03

Yeah exactly. So, then that will also say: if the property is already private you can't, again say that the accessors also private.

Derick Rethans 7:11

I think that's the wise thing, otherwise people go overboard with adding private and final and whatever everywhere anyway right.

Nikita Popov 7:18

One could argue that it's really not our business and this is a coding style question, but you know it's better to not leave people, with the option of doing stupid things.

Derick Rethans 7:28

I saw in the RFC that it is also possible to use references with the get accessor. Does this complicated implementation and the idea of this RFC, a lot, or just a little?

Nikita Popov 7:39

I think the important context to keep in mind here is that we already have magic get set, and the accessors are, like, largely based on their semantics. Magic getters already have this distinction between returning by value and returning by reference. The by reference return value is primarily useful for two cases. One and this is really the important one, is if you're working with arrays, any write operation on an array like setting an element or appending an element, those require that the getter returns by reference, because PHP will actually do the modification on the reference. Because some people asked about that. Why can't we just like get the array using the getter, then make the change and then assign back using the setter. That would theoretically work, but it would be extremely inefficient, and the reason is that this breaks PHP's copy on write mechanism. If the error is returned from the getter, then we have one array inside the property. And we have one copy of the array inside the property, and as the return value. Then we change the return value and the resource is now shared, we actually have to copy the whole array, and then we assign it back. So effectively what we do is we copy the array, we do single element change, and then we copy the array, we do a single element change and then we destroy the old array. That works in theory, but it's so inefficient that we would not want to promote this kind of usage.

Derick Rethans 8:42

The way around is of course, is having an implicit methods on the class to make this change to the array itself right?

Nikita Popov 9:10

That would be another option. Problem is that you will need a lot of methods, I mean it's not just a matter of setting a single element or unsetting an element, but you can also set like a deep element where you're not modifying the outermost array but, like, a multi dimensional array. You would actually have to pass through that information somehow as well. I don't think there is a simple solution to that problem beyond the reference based solution that we currently use.

Derick Rethans 9:34

I saw people arguing about not bothering with references in this new implementation at all, but I think you've now made a good case for keeping them.

Nikita Popov 9:42

Effectively not bothering with references just means not supporting that array use case. Which might be, maybe a reasonable limitation, especially if we like make a distinction and supported for the implicit accessor case where we can, you know, do internal magic to support that and not support it in the explicit accessors case. I mean, people were arguing that this reduces the complexity of the proposal, but it kind of also increases the complexity because now we are doing something else for the accessors and we're doing for the magic get/set, where we already have this established mechanism. I'm not really convinced by that.

Derick Rethans 10:20

And I also think it creates inconsistencies in the language itself because it does something different with an implicit or explicit accessor, as well as it being different between the original underscore underscore get magic method as well.

Nikita Popov 10:34

It's not a secret that I'm not a big fan of references, and I would certainly love to get rid of them, but it's a hard problem, and this array modification behaviour for magic get or for get accessors is certainly a large part of that problem, and I just don't have a good solution for it.

Derick Rethans 10:52

I don't either. The RFC also goes into great detail about inheritance and variance. Would you have a few words on that?

Nikita Popov 11:00

I think mostly inheritance works like inheritance does for methods, at least that's how it's supposed to work. Of course there are some interactions, because you can for example mix real properties and accessor properties. In which case, if you have parent accessor property, you can always replace it with a normal simple property, because normal properties they support all operations that accessor properties do. What you can't do is the other way around. If you have a parent normal property, then you can't replace that with an accessor property. And reason is that it does have some limitations. Not a lot, but there are some limitations. One of them is related to references, I mean, we're already talking about this topic. What the by reference get allows is taking a reference to the property, so you can do something like a reference equals the property. What you can't do is the other way around the property reference equals something else. So you can't assign a new reference into the property, that just doesn't work on a pretty fundamental level, because it would require an additional set handler for set by reference. As we don't particularly love references, adding a new mechanism to support that is not a very popular choice.

Derick Rethans 12:20

Variance wise, I guess, the same rules apply as for normal properties and property types?

Nikita Popov 12:27

Approximately. Properties are apparently invariant, so you can't change the type or I mean you can change it but it has to be an equivalent type. If you have a read only property, with only a getter, then the implementation makes the type covariant, which means you can use a smaller type in the child class. This is similar to how if you have a getter method, you could also give it a smaller type in the child class. The converse case, if you have a property that can only be set, then the type is contravariant, you can have larger type in the child class, though I should say that properties that can only be set are somewhat odd and really only supported for the sake of completeness, so maybe it might be worthwhile to drop the type specific behaviour there, because a set only property should already be really rare, and then set property with a contravariant inheritance that's like a edge case of an edge case.

Derick Rethans 13:24

Would it even make sense to support set only properties?

Nikita Popov 13:27

Not sure. So for the C#, implementation, I think they don't support this and there is a StackOverflow question about that, and people try to convince their, that they should support this, that the are really use cases. Currently the imagined use cases are along the lines of injecting values into a class, so using setter injection, just that now it's property based setter injection. Okay, I'll be honest I think it doesn't make sense.

Derick Rethans 13:55

To be fair, I don't think either. It would reduce the length of the RFC a little bit.

Nikita Popov 14:00

A little bit, yes.

Derick Rethans 14:01

Can you say a few words about abstracts, traits, private accessors shadowing and things like that. So a lot of complicated words, maybe you, you can distil that into something slightly simpler.

Nikita Popov 14:12

Well I think actually abstract properties are worth mentioning. In particular, the fact that you can now specify properties inside interfaces. If you have public properties, then it makes sense to have them really on the same level as public methods, so they are part of the API contract, and as such should also be supported in interfaces. Typically what the RFC allows is, you can't specify a simple property in the interface, but you can specify an accessor property, which tells you which operations have to be supported. So you can't have a property declaration that says, it just has a get accessor, or it has get and set. The implementation of course can always implement more, so if the interface requires get, then you can implement both get and set, but it has to implement at least get, either through an accessor offer another property. I think in most cases implementation will just be a normal property.

Derick Rethans 15:03

Because a normal property would implement an implicit get already anyway?

Nikita Popov 15:07

Yeah.

Derick Rethans 15:08

How do property accessors tie in, or integrate with constructor property promotion?

Nikita Popov 15:13

They are supported and promotion with the limitation that it's only implicit accessors. If you use constructor promotion, then you can specify your read only property in there, or property that is Public Read/ private write. You cannot specify a property with complex behaviour in there. This is mainly because it would mean that you embed large code blocks into the constructor signature, which is I think, pushing the limits of shorthand syntax, a bit. Like there is nothing fundamental that will prevent it, it's more a question of style.

Derick Rethans 15:50

The RFC talks a little bit about how, or rather what happens if you use foreach, var_dump, or an array cast on properties with explicit accessor. What are the restrictions here? Is something chasing from normal standard properties like we currently have.

Nikita Popov 16:03

I don't think so. So here is once again the case where we have this distinction between the implicit accessors, which are really just normal properties with limitations. So those show up in var_dump and array cast, foreach, as usual. And we have explicit accessors, which are really virtual properties, so they don't have any storage themselves. Any storage to use, you have to manage separately somehow. So, these don't show up in var_dump, foreach, and so on. Both these actual computed properties, they don't show up because that would require us to actually call all the accessors if you do foreach and that seems rather dubious to me.

Derick Rethans 16:44

How this will work for internal API's that some extensions use to access, like a list of all the properties, for say, for a debugger.

Nikita Popov 16:51

It'll work the same way as var_dump. I mean, in the end it's all, well it's not quite based on the same API's, but still, the answer is the same. You only get those properties that have some kind of backing storage, and those are only the ones that are either normal properties, or the ones with implicit accessors.

Derick Rethans 17:09

That means I need to go find out a way how to be able to read the ones with explicit accessors.

Nikita Popov 17:14

Yeah, if you want to. I don't think that the debugger should read those by default, because that means that doing a dump, will have side effects, which is not ideal, but maybe you want to have an option to show them.

Derick Rethans 17:26

That's something for me to think about, because I'm pretty sure people are going to want to see the contents of these properties, even in a debugger, even though that could mean that are side effects, which I'm not keen on.

Nikita Popov 17:36

I guess that's one of the, I would say advantages of using this over just magic get/set, because actually know which properties you're supposed to look at, with for magic get/set you just don't know at all.

Derick Rethans 17:51

The RFC talks a little bit about the performance impacts and although I saw the numbers I didn't actually read them, when preparing for this recording. What are the performance impacts for implicit accessors as well as explicit ones?

Nikita Popov 18:02

Impact is basically if you use implicit accessors that has similar performance to plain properties, performance is a bit worse. The reason is essentially that we have some limitations on caching. So we can't just cache it as if it were a normal property, because it could have asymmetric visibility. And we reuse the same cache slots for reads and writes. I've been thinking about maybe splitting that up but at least for now there is a small additional performance impact of using implicit accessors, but it's not really significant. On the other side if you use explicit accessors. Those are expensive, they are not quite as expensive as using magic get/set, but they are more expensive than using normal method calls. Reason is basically their normal method calls, they are very optimized, and they do not have to re enter the virtual machine, so we just stay in the same virtual machine loop, and we just switch to different stack frames. For magic get/set we actually have to like recursively call the virtual machine, because we don't have a good point to re enter it, at least based on our current API's. And we also have to deal with some additional stuff, particularly the fact that magic get/set and property accessor as both, they have recursion guards. Normally if you recurse methods in PHP, we don't do any checks about that. Xdebug does, but PHP itself doesn't, so you can infinitely recurse and PHP is fine. The only thing that happens is that at some point you'll run out of memory.

Derick Rethans 19:37

Or when extensions are loaded such as Xdebug, you'll actually still get a stack overflow.

Nikita Popov 19:41

So that's something we should still be addressing, at least the baseline behaviour that you can get to that memory limit error. For properties will set have recursion guards, which say that if you recursively access a property in magic get/set, that it will not call magic get/set again and instead, access the property as if they didn't exist.

Derick Rethans 20:01

Instead of throwing in an error?

Nikita Popov 20:03

Yeah. For property accessors I'm actually throwing an error on recursion, and the reason for that is if we didn't throw an error, then this would end up accessing dynamic property of the same name as the accessor, which would technically work, but it's very likely not what the programmer actually intended. So it's going to be really inefficient because you actually have to allocate space for the dynamic properties and access for those. So if you wanted to have some kind of backing storage for the property, then you should just explicitly declare it and access that, rather than accessing something with the same name and implicitly creating a dynamic property.

Derick Rethans 20:41

Yeah, that sounds all very complicated.

Nikita Popov 20:44

It's cleaner to just make it an error and let the programmer fix it, instead of PHP try to fix it for you.

Derick Rethans 20:51

Are there any BC considerations about the introduction of property accessors?

Nikita Popov 20:55

Not strictly, but I'm sure that it's going to break, various assumptions for people, or at least in the sense that, right now, most assumptions should already be broken through magic get/set. I mean you can always have this kind of magic behaviour. If we have accessors this is probably going to be a lot more common, and people will have to deal with things like properties being publicly readable, but privately writable much as because someone very rarely manually implements that behaviour, but because the language though has native support for it and it's going to be common.

Derick Rethans 21:28

We spoke a little bit about all the different sticking points in his RFC, for example with references, but there's one other thing and I think it's an argument you make somewhere on the bottom of the RFC, that there is a separation between implicit and explicit property accessors. I'm wondering whether it would make sense to consider adding whether the implicit part of this RFC first and then maybe later look at adding explicit property accessors.

Nikita Popov 21:54

That's really the main sticking point, and also my own problem with the RFC. I mean, you mentioned at the very start, that this is a very long RFC and still a little bit incomplete so it's going to be longer. It's a fairly complex feature that has complex interactions with other features in PHP. The implementation is actually, maybe less complex, then you think, given the RFC length. The main concern I have is that, at least for me personally the most useful part of the RFC, are the read only properties. The read only properties and the like Public Read, Private Write properties. I think these two cover like 90% of the use cases, especially because if you have a property that is only publicly readable, then you don't really have to be concerned about this case where you have to, later on, add additional validation. I mean after all the property is read only, or you control all the sets because they're private. There is no danger of introducing an API break, because you have to add additional validation. I think like the largest part of the use case of the whole accessors proposal will be covered by these two things, Or maybe even just one of these two things, that's a bit of a philosophical question. There are some people who think we should have just public read / private write and no proper read only properties, because that like looks the same from the user perspective, but still gives you more flexibility. I think that's like the most important use case, and we could implement that part with a lot less language complexity. So the question is really does it make sense to have this full accessor proposal, if we could get the most useful part as a separate simpler feature, and, well, I heard differing opinions on that one. I was actually pretty surprised that their reception of the on like a full accessors proposal was fairly positive. I kind of expected more pushback, especially as, this is the second proposal on the topic, we had earlier one with, like, similar syntax even though different details, and that one did fail.

Derick Rethans 24:02

How long ago was that?

Nikita Popov 24:03

Oh that was quite a while actually, at least more than five years.

Derick Rethans 24:06

I think that the mindset of developers has changed in the last five to 10 years, like introducing this 10 years ago would never happened, or even typed properties, right. It would never have happened.

Nikita Popov 24:17

That's true.

Derick Rethans 24:19

Do you have any idea when you're going to put us up for a vote? Because, of course, PHP 8.1 feature freezes coming up in not too far away from now.

Nikita Popov 24:28

Yeah, I'm not sure about that. I'm still considering if I want to explore the simpler alternatives, first. There was already a proposal. Another rejected proposal for Read Only properties, probably was called write once properties at the time. But yeah, I kind of do think that it might make sense to try something like that, again, before going to the full accessors proposal or instead.

Derick Rethans 24:54

Do you have anything else to add?

Nikita Popov 24:56

What are your thoughts on this proposal, and the question at the end?

Derick Rethans 24:59

I quite like it, but I also think that it might make sense to split it up. I'm always quite a fan of splitting things up in smaller bits, if that's possible too, and still provide quite a lot of use out of it. And that's why I was asking whether it makes sense to split it up into the implicit part and the explicit part of it, especially because it makes the implementation and the logic around it quite a bit easier for people to understand as well.

Nikita Popov 25:24

It's maybe worth mentioning that Swift also has a similar accessor model but it is more like a composition of various different features like read only properties, properties with asymmetric visibility, and then finally properties with like fully controlled, get and set behaviour rather than this C# model where everything is modelled using accessors with appropriate modifiers. So there is certainly precedent in other languages of separating these things.

Derick Rethans 25:55

Something to ponder about, and I'm sure we'll get to a conclusion at some point. Hopefully some of it before PHP eight one goes and feature freeze, of course. We've been chatting for quite a while now, I think we should call it the end for this RFC. Thank you for taking the time today to talk about property accessors.

Nikita Popov 26:11

Thanks for having me, Derick.

Derick Rethans 26:12

Thank you for listening to this installment of PHP internals news, a podcast, dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time.


PHP Internals News: Episode 85: Add IntlDatePatternGenerator

PHP Internals News: Episode 85: Add IntlDatePatternGenerator

In this episode of "PHP Internals News" I discuss the Add IntlDatePatternGenerator RFC with Mel Dafert (GitHub).

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:14

Hi I'm Derick, welcome to PHP internals news, the podcast, dedicated to explain the latest developments in the PHP language. This is episode 85. Today I'm talking with Mel Dafert about the "Add Intl Date Pattern Generator RFC" that she's proposing for inclusion into PHP 8.1. Mel would you please introduce yourself?

Mel Dafert 0:35

Hello, I am Mel. I've been working professionally with PHP for about three years. Recently I started reading the internals mailing list in my free time, but this is my first time contributing.

Derick Rethans 0:46

What made you think starting to read the PHP internals mailing list?

Mel Dafert 0:50

I generally like reading mailing lists and issue trackers. And since I work with PHP, it was interesting to read what's, what's happening.

Derick Rethans 1:02

That's what I'm trying to read this podcast as well of course; explaining what happens in the PHP development. But let's get to your RFC. What is the problem that you're trying to solve for this?

Mel Dafert 1:14

Currently, PHP exposes the ability for locale dependent date formatting with the Intl Date Formatter class. It is basically only three options for the format: long, medium and short. These options are not flexible enough in some cases, however. For example, the most common German format is day dot numerical month, dot long version of the year. However, neither the medium nor the short version provide this, and they use either the long version of the month, or a short version of the year, neither of which were acceptable in my situation.

Derick Rethans 1:47

I realize that you basically ran into a problem that PHP wasn't doing something you wanted to do it. But what made you actually wanting to contribute this?

Mel Dafert 1:57

I ran into this exact problem at work where I wanted to format dates in this specific way. After some research, I found out that ICU, the library that powers Intl Date Formatter, exposes exactly this functionality already. It would be relatively easy to wire this up into PHP and expose it there as well. I also found in a bug report that other people had this problem as well, so I decided to try my best at hacking at the PHP source and make it available to everyone, using PHP.

Derick Rethans 2:25

Had you ever seen a PHP source code before?

Mel Dafert 2:28

I don't think so. No.

Derick Rethans 2:29

But you are familiar with C a little bit?

Mel Dafert 2:32

On a very basic level, yes.

Derick Rethans 2:34

As part of this RFC What are you trying to suggest to add to PHP?

Mel Dafert 2:39

ICU exposes a class called date time pattern generator, which you can pass a locale and so called skeleton and it generates the correct formatting pattern for you. Skeleton just includes which part are supposed to include it, to be included in the pattern, for example the numerical date, numerical month, and the long year, and this will generate exactly the pattern I wanted earlier. It is also a lot more flexible, for example the skeleton can also just consist of the month and the year, which was also not possible so far. I am proposing to add a Intl Date Pattern Generator class to PHP, which can be constructed for locale, and exposes the get best pattern method that generates a pattern from a skeleton for that locale.

Derick Rethans 3:22

The skeletons, what do you specify in these skeletons?

Mel Dafert 3:27

It's a similar format to the pattern itself. For example, it's lowercase y lowercase y uppercase M uppercase M, would give you only the year and only the month, if I'm correct, that's exactly what the skeleton looks like.

Derick Rethans 3:43

But it puts it in the right order?

Mel Dafert 3:45

It puts it in in the right order, and in some cases also adds extra characters, or even changes the format slightly, depending on the locale.

Derick Rethans 3:55

So it is a bit of a flexible way to tell the Intl extension to format them in a slightly more, well how do you say this, a slightly more intelligent way than what the standard, long, short and medium constants do for you.

Mel Dafert 4:11

Exactly.

Derick Rethans 4:12

Why is it so important that you get these formats, right, or rather I should say, how do these locales influence formats and why is this important?

Mel Dafert 4:21

There are conventions of how to format dates and times vary rather strongly between languages and country. In Austria, for example, nobody would expect to understand the US format of month slash day last year. I assume people in England may have the same issue.

Derick Rethans 4:38

I think everybody has that issue except for people in the US.

Mel Dafert 4:42

But that only shows the importance of using a format that people are used to and understand. Other languages like mainland Chinese even have the words for day and month included in the format, as far as I understand. I don't speak Chinese.

Derick Rethans 4:59

Neither do I, but a long time ago when I, when I added the date time support, not Intl, but PHP standard date time support, I also looked at locales that operating systems have. And even these locales, which is not something that Intl uses now, also encode these extra characters at least for Japanese, so that was interesting to see there as well.

Mel Dafert 5:22

There is a lot of sometimes somewhat unexpected formats.

Derick Rethans 5:27

And I think German sometimes once the add the in front, and sometimes behind and things like that. I know there's lots of little intricacies, yes. I see that he RFC makes an argument about which name to pick for the new class. Can you elaborate on the two different options that are?

Mel Dafert 5:44

Yes, this is certainly for us and what I would call bike shedding. ICU has something of an inconsistency in its naming. The formatting class is called date formatter. And the pattern generator class is called Date Time pattern generator.

Derick Rethans 6:00

So it has the extra word time in it?

Mel Dafert 6:03

Between some inconsistency with Intl Date Formatter, which already exists in PHP, and the Intl Date Time pattern generator, or if we make sure PHP is internally consistent and omit the time in all cases. So far consensus seems to lean towards the second option. This is also what the Hack people decided to use.

Derick Rethans 6:24

And I believe that's the one you are wanting to go with in this RFCs as well, right?

Mel Dafert 6:28

Exactly. So far, everybody voted slide, or like express themselves to slightly favour the version without time. So that's the one I'm going with.

Derick Rethans 6:40

Of course, as you mentioned, this is a fairly small change to it, but the RFC talks a bit about things to add in the future, because I believe you weren't suggesting to add all of these Intl functionality straightaway. What is this future scope?

Mel Dafert 6:55

ICU would also expose more methods around the skeletons, for example, turning a pattern back into its skeleton, or building a list of skeleton and then mapping to the patterns from scratch. That's what you would do in theory if you added your own special locale to this.

Derick Rethans 7:17

I'm not sure how to do that with PHP actually, but I think ICU allows you to build your own basically files with settings right?

Mel Dafert 7:25

Exactly. This is omitted all of this, for simplicity, and because they couldn't think of a use case for it, personally, at least. If someone does need them, they could easily be added. It would just be a bunch of extra methods on the, on the class.

Derick Rethans 7:43

I know that ICU has so much functionality that hasn't been exposed to PHP, because there's just so much of it right?

Mel Dafert 7:50

Extremely, yes. I did see that Hack decided to expose all of them, like all the methods that the class has, but I really don't see the use of having to document and test all of these methods when really only one is going to be used. So I've decided to just go for the one that I can actually see people using.

Derick Rethans 8:14

And it is always easy to get smaller parts added to PHP than big things, to begin with.

Mel Dafert 8:21

Exactly.

Derick Rethans 8:22

How has the reception been so far?

Mel Dafert 8:24

I haven't gotten feedback from too many people, but it seems positive so far. A few people that did give some feedback were constructive and seem to seem to like the idea of adding this.

Derick Rethans 8:36

I reckon outside of English speaking countries this is quite an important thing to actually support, especially as we just discussed, people are picky about how these things are formatted.

Mel Dafert 8:46

Very picky.

Derick Rethans 8:48

So the name that you're going for would be Intl Date Pattern Generator, would it also support patterns for the time itself?

Mel Dafert 8:55

Of course, just like Intl Date Format also support formatting time.

Derick Rethans 9:02

It would be strange if it didn't, to be honest.

Mel Dafert 9:04

Yeah.

Derick Rethans 9:05

When do you think you're going to put us up for a vote for inclusion to PHP 8.1?

Mel Dafert 9:10

I think I sent out the first email about two weeks ago for opening the discussion. So I was planning to send out the heads up, either today or tomorrow, and opening the vote after that.

Derick Rethans 9:23

Okay. To be fair, I think there is very little controversy in this one, so it would surprise me if it didn't pass.

Mel Dafert 9:30

That's reassuring. I am somewhat anxious about them.

Derick Rethans 9:33

It's not controversial, it is an, it is perhaps a niche thing but it is something that is useful, so I can't see people really be opposing to this. To be fair, I think it looks like just an omission from when the Intl extension was written in the first place.

Mel Dafert 9:48

That's true. It might have not been supported in ICU at that point.

Derick Rethans 9:54

That is a good point as well because I think the Intl extension came with PHP five three, or five four, which I think is now eight years ago or something like that.

Mel Dafert 10:04

I think, I think ICU might have not had it at the end. It's an old word, like it's an all supported versions of PHP.

Derick Rethans 10:13

That is good to know. Would you have anything else to add?

Mel Dafert 10:16

No, I think that's it.

Derick Rethans 10:17

Thank you for taking the time today to talk to me about your proposal to add the Intl date pattern generator to PHP 8.1

Mel Dafert 10:25

Of course. Thank you for having me.

Derick Rethans 10:29

Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool, you can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time.


PHP Internals News: Episode 84: Introducing the PHP 8.1 Release Managers

PHP Internals News: Episode 84: Introducing the PHP 8.1 Release Managers

In this episode of "PHP Internals News" I converse with Ben Ramsey (Website, Twitter, GitHub) and Patrick Allaert (GitHub, Twitter, StackOverflow, LinkedIn) about their new role as PHP 8.1 Release Managers, together with Joe Watkins.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:14

Hi, I'm Derick, welcome to PHP internals news, a podcast, dedicated to explaining the latest developments in the PHP language. This is episode 84. Today I'm talking with the recently elected PHP 8.1 RMs, Ben Ramsey and Patrick Allaert. Ben, would you please introduce yourself.

Ben Ramsey 0:34

Thanks Derick for having me on the show. Hi everyone, as Derick said I'm Ben Ramsey, you might know me from the Ramsey UUID composer package. I've been programming in PHP for about 20 years, and active in the PHP community for almost as long. I started out blogging, then writing for magazines and books, then speaking at conferences, and then contributing to open source projects. I've also organized a couple of PHP user groups over the years, and I've contributed to PHP source and Docs and a few small ways over the years, but my first contributions to the project were actually to the PHP GTK project.

Derick Rethans 1:14

Oh, that's a blast from the past. You know what, I actually still run daily a PHP GTK application.

Ben Ramsey 1:21

Oh, that's interesting. What does it do?

Derick Rethans 1:23

It's Twitter client.

Ben Ramsey 1:24

Did you write it.

Derick Rethans 1:26

I did write it. Basically I use it to have a local copy of all my tweets and everything that I've received as well, which can be really handy sometimes to figuring out, because I can easily search over it with SQL it's kind of handy to do.

Ben Ramsey 1:41

It's really cool.

Derick Rethans 1:42

Yep, it's, it's still runs PHP 5.2 maybe, I don't know, five three because it's haven't really been updated since then.

Ben Ramsey 1:49

Every now and then there will be some effort to try to revive it and get it updated for PHP seven and eight, but I don't know where that goes.

Derick Rethans 1:59

I don't know where that's gone either. In this case, for PHP eight home there are three RM, there's Joe Watkins who has done it before, Ben, you've just introduced yourself, but we also have Patrick Allaert, Patrick, could you also please introduce yourself.

Patrick Allaert 2:13

Hi Derick, thank you for the invitation for the podcast, my name is Patrick Allaert. I am a Belgian freelancer, living in Brussels, and I spent half of my professional time as a IT architect and/or a PHP developer, and the other half, I am maintaining the PHP extension of Blackfire, a performance monitoring solution, initiated by Fabien Potencier.

Derick Rethans 2:39

I didn't actually know you were working on that.

Patrick Allaert 2:40

I'm not talking much about it but more and more. So I succeeded to Julian Pauli, who by the way was also released manager before so now I'm working with Blackfire people. It's really great, and this gives me the opportunity to spend about the same amount of time developing in C and in PHP. This is really great because at least I don't. It's not just only doing C. I, at least I connect with what you can do with PHP. I see the evolution from both sides. And this is really great. It's great, it's also thanks to you Derick, you granted me access to PHP source codes. That was to contribute to testfest something like 12/13 years ago, it was, CVS, at that time.

Derick Rethans 3:28

CVS, so now I remember that. Basically, what you both of you're doing is making me feel really old and I'm not sure what I like that or not. I think we all have gotten less head on our heads and greyer in our beards. In any case, what made you volunteer for being the PHP 8.1 RM?

Patrick Allaert 3:46

In my case, I think there were two two reasons is that PHP really brings a lot to me in my career, everything is built around my expertise in PHP and its ecosystem. By volunteering as a release manager. I think I can give something back to PHP, because the last time I contributed to source code of PHP, it was really years ago. If I remember it was array to string conversion that was very silent and not emitting any notice; now it's warning. In the meantime, so I think that was PHP 5.0,

Derick Rethans 4:22

Ages ago.

Patrick Allaert 4:23

Ages ago. Indeed. I was quite passive I was mostly reading on PHP internals, and most of the time now that is quite big so if, if I had to say something I could always see some someone who already just said the same thing so I was not saying: plus one. This is one of the reason and the second one I think is that I think it's kind of a unique opportunity, and I can learn a couple of things. I think, on day one when the Rasmus gave me the access, saying that I can do to OAuth authentication on SSH and that was: okay, day one I already learned something, so that was really cool.

Derick Rethans 4:58

And you Ben, I think you tried to be the PHP eight zero release manager as well at some point. That didn't happen at the time, but you've tried again.

Ben Ramsey 5:06

I almost didn't try again. I don't know why but when Sara announced it this year, I thought about it, and I don't know, I tossed it around a little bit, but I've been wanting to do it for a long time and I've noticed as Joe Watkins recently put it on a blog post that we need to help the internals avoid buses. So since this is a programming language that I've spent a lot of time with just as Patrick mentioned, both in and out of my day jobs. I want it to stick around to thrive. Since I'm not a C guru, but I do have a lot of experience managing open source software. I wanted to volunteer as a release manager, and I hope that I can use this as an opportunity to inspire others who might want to get involved, but don't know how.

Derick Rethans 5:55

And of course you just mentioned Joe, Joe Watkins, who is the third PHP release manager for 8.1, and that is a bit of a new thing because in the past, when the past many releases I can remember you've only had two most of the time.

Ben Ramsey 6:09

I think, on the mailing list that came up early on in the thread, and there was a general consensus, I think, consensus may be the wrong word, but there were a couple of people who spoke up and said that they wouldn't mind seeing multiple rookies or mentees or whatever you want to call us, and Joe when he volunteered to be the veteran, and he was the only one who volunteered as the veteran. He said that he would take on two. And so that's that's why Patrick and I are both here and I think that's a good idea, because it will continue to help, you know, us to avoid buses.

Derick Rethans 6:46

Yep. And if you're three, you only have once every 12 weeks. Whereas of course, in my case doing it for PHP 7.4 it's every four weeks, because it's me on my own, isn't it. Which is unfortunate that these things happen because people get busy in life sometimes. Getting started being a PHP release manager can be a bit tricky sometimes because just before we started recording, I had to add you to a few mailing lists. Do you think you've now have access to everything, or what do you need access to to begin with?

Patrick Allaert 7:18

There is the documentation about release managers, what are you supposed to do, and, and there is an effort of documentation, what you have to ask, in terms of access, and that's great. We are probably going to contribute with our findings to, to improve the documentation. Once you did a bit of the setup, mainly needs to access the servers. You should also know what is the workflow and what are the usual tasks. This is mentioned in the documentation, but I think it would be better to have a live discussion with someone that already did it. The fact that we are doing it with Joe Watkins, who is not only a release manager of 8.1, but also previous release manager, that should be really smooth, to, to see what the the orders and what is the routine to do. To do so, why do you think Ben?

Ben Ramsey 8:16

I agree. I think that, I mean we've only just gotten started. It's only this I believe is what was it two weeks ago that we, that this was announced. So this is the first time that Patrick and I have actually spoken face to face. Hi, Patrick! We've communicated by email and slack. I'm sorry not Slack, StackOverflow chat. Joe has given us a lot of good pointers. I feel like some of the advice he's given his been really good, but it's like Patrick said, we haven't really had like a live, like one on one chat, or face to face chat, where we could kind of get caught up on things and understand what the flow looks like. So last week I started going through a lot of the pull requests on GitHub. And I've been tagging them as bug fixes or are enhancements, and there's also an 8.1 milestone that I've been adding to a lot of the tickets, are the pull requests, and I've merged a few of them, but I think that I've merged them a little prematurely. So there were some funny things that came up out of that. I do plan to blog on this, but one of Nikita's comments in the Stack Overflow chat was, you've just made it your personal responsibility to add tests for uncovered parts of the Ristretto255 API.

Derick Rethans 9:40

Right, exactly say because I'm doing release management for PHP seven four. I don't do any merging at all. The only thing I'm doing is making the packages, and then coordinating around them. I'm not even sure whether it is a responsibility of a release manager to do.

Ben Ramsey 9:55

It may not be a responsibility. I felt like it was helpful maybe to go ahead and take a look and see where things were trying to follow up with people, to get them to respond if something had been sitting there for two weeks or so without any kind of movement. I would, you know, leave a message saying what's the status of this.

Derick Rethans 10:19

I know from the documentation that we have on our Release Management process. And many of these steps actually been replaced by a Docker container that actually builds the binaries, so I'm not sure whether Joe I've mentioned that to you yet, because I'm not sure whether that was around when he did release management, the previous time.

Ben Ramsey 10:36

Right, it wasn't around either when he did release management, but he's also mentioned that he would like for us to learn how to do it without the Docker container, even if we do plan to use the Docker container.

Derick Rethans 10:48

That's fair enough, I suppose. I have never had to do that, but that there you go. Now, what is the timeline like?

Patrick Allaert 10:56

In terms of timeline I think the very first thing is being all three release managers having live discussion to define what, what we should do, when we should do, and how. This way we clearly knows our responsibility and the sequence, and also how we are going to organize. Do we do every three releases? We share the task? How are we going to do the work together. In terms of timeline I think the very first release is going to happen in June, if I remember correctly. I set up an agenda sheet with ICAL so that we all can put that in our calendar, nothing really clear on my side.

Derick Rethans 11:41

From what I can see from the to do list that the first alpha release is June, 10, which is exactly a month away from when we are recording this.

Patrick Allaert 11:51

Right, yeah, it's one month come down before the very first one. I think it might be great that the very first release being made by by Joe, so that we can really see every single step he's doing, so that we can do the same. However, I guess it's kind of a shared responsibility to do triage of bugs and pull requests.

Ben Ramsey 12:14

Right. I think there is some desire among the community to see these releases in real time at least a few of them. So I'm going to try to encourage us to stream some of them maybe live, or at least record it and put it up somewhere for people to kind of just see the process to demystify it, so to speak.

Derick Rethans 12:35

I actually tried it a few months ago to record it, but there were so many breaks and pauses and me messing things up, and me swearing at it, that I had to throw away the recording. I mean the release went out just fine but like absolute as again... I can imagine the first few times, you're trying this there might be some swearing involved, even though you might not vocalize that swearing.

Ben Ramsey 12:56

Oh I'll vocalize it.

Derick Rethans 12:58

Fair enough. This is something that is that you're going to have to do for the next three and a half years. Do you think you'll be able to have the time for it in another three years?

Ben Ramsey 13:08

I mean for myself I I'm committed to it, I definitely believe that I'll have the time over the next three and a half years, and I'll make the time for it.

Derick Rethans 13:18

What about you, Patrick?

Patrick Allaert 13:20

Exactly the same. I think it's the least that I can do to PHP, in terms of contributing back, there will be some changes because I it's not like it's, it's not like the infrastructure is something that doesn't change, like for example recently, GitHub, being more having more focus rather than our Git infrastructure. So the changes that will happen, we will have to adapt, I have the impression that release manager has to, every time it's adapting to change this, and that will be very interesting.

Derick Rethans 13:53

Luckily we haven't had too many. The only thing I had to change with a change from git dot php.net to get up, was my local remote URLs. So there wasn't actually a lot to do, except for running git remote set-url. I was pleasantly surprised by this because if anything messing around with Git isn't my favourite thing to do.

Ben Ramsey 14:14

Also, merging is a little bit more streamlined now you don't have to go to qa.php.net to do that.

Derick Rethans 14:21

I've never done anything without

Ben Ramsey 14:23

Really? Oh, I guess you would commit directly to git.php.net?

Derick Rethans 14:27

Yep.

Ben Ramsey 14:28

If there were PRs on GitHub, the only way to merge them well, probably wasn't the only way but one way to merge them was to go to qa.php.net, and if you were signed in with your PHP account, you were able to see all the pull requests, and choose to merge them.

Derick Rethans 14:46

Yep, also something I've never done as an RM. The only way how I have reacted with pull requests is commenting on the pull requests, and I wouldn't merge them myself. With the only exception of security releases where you need to cherry pick from certain branches into your release branches. I'm not always quite sure about it as the responsibility for release managers actually do the merging into the main branches. From what I've understood is it's always the people that made the contributions, who just merge themselves, and you then sometimes need to make sure that they merge into the right branch instead of just master, which is what, as far as I know, the, the buttons on GitHub do.

Ben Ramsey 15:21

Well the individual contributors, in this case, if they're doing like a bug fix or something, most of them, or many of them aren't don't have permission to do the merging, so someone else has to merge it, like, often I see Nikita merging, a lot of the pull requests.

Derick Rethans 15:37

Maybe I've just been relying on Nikita to do that then. I'm not sure how, bug fixes are merchants debug fig branches. I think it's usually been done by people that have access already anyway, because it's often either Nikita or Cristoph Becker, or Stas, and the main developments, or the main other new things that people don't have access to are usually to master. So I guess there's a bit of a difference now. I'm not sure what if any other questions, actually, would have anything to add yourself?

Patrick Allaert 16:05

maybe something that would be quite challenging is the very recent discussion about the system that we, that we might change from. The system or the issue tracker with where we have all the bugs. I understand the current issues, I understand as well the drawbacks of what is possibly, for example GitHub issues. It might be great for some, would it be great for us? If we do it was going to be in the bring a lot of changes, and I think, 8.1 will be already slightly impacted by the change to GitHub in terms of pull request strategies, but potentially there will be another change, which is around the bugtracking system.

Derick Rethans 16:54

I have strong opinions about this, but we'll leave that for some other time. What about you, Ben?

Ben Ramsey 17:00

Right, I actually don't think that we're going to end up making a lot of changes in that regard, very, like, not in the near term, probably. But I did want to point out, or promote that I've started journaling some of these experiences, and capturing information mainly for my own purposes, but I'll be posting these publicly so that others can follow along. My blog is currently down right now.

Derick Rethans 17:28

That's because you're using Ruby isn't it?

Ben Ramsey 17:30

That's because I'm using Ruby. The short story of it is that there are some gems that were removed from the master gem repository at some point in the past, or the versions I'm using were removed, either for security reasons or what I have no idea why. And that's put, put it into a state where I just can't easily update. I just haven't, I just don't care, right now, so I plan on migrating to something else. In the short term, I'm not going to be doing that. So I've started writing at https://dev.to/ramsay and Dev.to is just a developer community website. If you're on Twitter. It's run by @thePracticalDev, I'll be, I'll be blogging there.

Derick Rethans 18:18

And I'll make sure to add a link to that in the show notes as well. Thank you for taking the time this afternoon, or morning, to talk to me about being a PHP 8.1 release managers.

Ben Ramsey 18:28

Thank you for having me on the show.

Patrick Allaert 18:30

Thank you, Derick for that podcast. I'm really glad you invited us.

Derick Rethans 18:39

Thank you for listening to this installment of PHP internals news, a podcast, dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time.