PHP Internals News: Episode 63: Property Write/Set Visibility

PHP Internals News: Episode 63: Property Write/Set Visibility

In this episode of "PHP Internals News" I talk with André Rømcke (Twitter, GitHub) about an RFC that he is working on to get asymmetric visibility for properties.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 63. Today I'm talking with André Rømcke, about an RFC that he's proposing titled property write/set visibility. Hello André, would you please introduce yourself?

André Rømcke 0:38

Hi Derick, I'm, where to start, that's a wide question but I think I'll focus on the technical side of it. I'm been doing PHP for now 15 plus years. Actually started in, eZ systems, now Ibexa, met with you actually Derick.

Derick Rethans 0:56

Yep that's a long time ago for me.

André Rømcke 0:58

And well I came from dotnet and front and side of things. Eventually I learned more and more in PHP and you were one of the ones getting me deeper into that, while you were working on eZ components. I was trying to profile and improve eZ publish which what's left of the comments on that that point. A long time ago, though. A lot of things and I've been working in engineering since then I've been working now actually working with training and more of that, and the services and consulting. One of the pet peeves I've had with PHP has been properties for a long time, so I kind of wanted several, several years ago to look into this topic. Overall, like readonly, immutable. To be frank from our side that the only thing I really need is readonly, but I remember the discussions I was kind of involved for at least on the sideline in 2012/ 2013. Readonly and then there was property accessors. And I remember and there was a lot of people with different needs. So that's kind of the background of why I proposed this.

Derick Rethans 2:04

We'll get back to these details in a moment, I'm sure. The title of the RFC is property write/set visibility. Can you give me a short introduction of what visibility actually means in this contract, in this context?

André Rømcke 2:16

Visibility, I just use the word visibility because that's what doc usually in php.net says, but it's about access the property, so to say, or it being visible to your code. That's it on visibility but today we can only set one rule so to say, we can only say: this is either public, private, or protected, and in other languages, there are for good reasons, possibilities to have asynchronous visibility. So, or disconnected or whatever you want to call it, between: Please write and read. And then in, with accessors you'll also have isset() and unset() but that's really not the topic here at this point.

Derick Rethans 2:56

PHP sort of supports these kind of things, with it's magic methods, like with __get() and __set(). And then also isset() and unset(). You can sort of implement something like this, of course, in your own implementation, but that's, of course, ugly, I mean, ugly or, I mean there was no other way for doing it and I remember, you made have a user that in eZ Components that you mentioned earlier. What is the main problem that you're wanting to solve with what this RFC proposes?

André Rømcke 3:25

the high level use case is in order to let people, somehow, define that their property should not be writable. This is many benefits in, when you have multiple API's, in order to say that I have this property should be readable. But I don't want anyone else about myself to write to it. And then you have different forms of this, you have either the immutable case were you literally would like to only specify that it's only written to in constructor, maybe unset in destructor, may be dealt with in clone and so on, but besides that, it's not writable. I'm not going into that yet, but I'm kind of, I was at least trying to lay the foundation for it by allowing the visibility or the access rights to be asynchronous, which I think is a building block for moving forward with immutability, we only unintentionally also accessors but even, but that's a special case.

Derick Rethans 4:24

What is the syntax that you're suggesting to implement this, this feature?

André Rømcke 4:30

Currently in the RFC there's two proposals. There's one where you have, for instance public:private. First you define the read visibility and then you run the find the write visibility. That was based on the email list back in 2012 proposal made back then. The one is inspired by Swift, the apple language, where they actually have a concept for this. So they have public than being the read access and then they have private and in the parenthesis they have (set). Basically this is then set. Benefit of this maybe it will probably be a better match for our reflection API, in terms of getting the modifier names and stuff like that. Secondly, the terminology with set they're kind of just nicely with accessors, so that's why I added this alternative suggestion to the syntax.

Derick Rethans 5:27

Would either of these would it be possible to extend this to get actual setters and getters later?

André Rømcke 5:32

Yes. Well this is a good question actually and Larry brought this up on the mailing list. In terms of how this actually should align up to a potential future accessors, if that is ever added. Made a good point, we started to rethink this a bit so I think I might have to discuss with him and also Nikita on this, but his point was: okay so if we add this now, how will this look, if you add accessors later to your code. The mindset of thinking I was thinking, this would be an either or. Oh, this would be kind of a shorthand for what you can specify in much more details with accessors. So once you use accessors you would remove this word, because it doesn't have a meaning any more. It's basically language syntax sugar, so to say, or short hand.

Derick Rethans 6:21

Besides the two that are in the RFC. Did you consider any other syntaxes that you discarded?

André Rømcke 6:26

Yeah I actually had a previous RFC where I, which is linked to in this one, where I was looking into how it might look like with the new attributes, instead for directly for the read only and immutable keywords for instance. Some might find it strange that I'm using referring to both, keywords, because in some languages they mean one and the same. That first draft explored a bit on the attributes syntax how that could work. I like that a lot actually. But it made me realize that underneath, what we're actually dealing with, is asynchronous visibility. So I don't think necessarily people agree with me but I think from at least in how we expose this in a reflection API. It's much nicer to work with if it's purely about, okay, do I have write access or not, as opposed to having to deal with a checking if there is an attribute called readonly in the reflection cote, which for me, looks like code smell or. It looks like it's solving the wrong problem or, I don't know how to phrase it.

Derick Rethans 7:32

And I think I agree with you there. I'll always consider attributes as something that modifies the attribute, but visibility would be something that is so inherent of a property that it doesn't really make a lot of sense to do it that way, but that's my feeling about it. I mean, it makes sense to have an attribute for JIT or no JIT because that's some contextual meaning to something else. Similarly, like if you have the ORM attributes or column attributes that people have been suggesting and mainly estate, they convey information about the properties to some other third party tool, not necessarily to PHP itself. Because PHP doesn't care about its JIT or not. It's OPCache cares about it and this would be a similar way right?

André Rømcke 8:13

So in this case, it might have been better as a keyword, but I was, so I was exploring that as well and that's what's been done in the past in 2012 and then later in 2018 with that immutable RFC, and then again in 2020 with the writeonce RFC which is doing this, different semantics and focusing more on the write once but anyway those three cases were keywords. I would guess that you would prefer rather a keyboard on for this kind of things.

Derick Rethans 8:42

Oh you have keywords already right, I mean private and public are keywords, or the alternatives that you have with having a modifier on when this keyword exists. I mean, is a similar idea of doing it. I mean, I haven't really thought about which ones I prefer more than the other but I would definitely prefer something like this over an attribute.

André Rømcke 9:00

Either way, the benefit of doing adding a keyword or attributes for this is that you could, you could also allow them on the class level so you could allow it to be okay, unless anything else is to find then, please let this be the default but all properties be, for instance, immutable, unless they say otherwise. They would need the keyword on for not immutable or something like that. In a immutable case, if you say on the class it's immutable, the whole class should be immutable, and I'm sorry.

Derick Rethans 9:27

I don't think it makes sense to that subdivided does it?

André Rømcke 9:30

For me at least that distinction, which could exist between readonly and immutable. Readonly, if you look away from how C sharp is defining it and rather look to Rust. In Rust, readonly is basically the modules can read it, but not write it, your module can write to it. So that's the definition of readonly and I think there are use cases for it. But, Marco is challenging me on that so I need to find.

Derick Rethans 10:00

It's often good when Marco challenges people on these things because I would trust him with making that kind of semantic choices over many other of the people that are really good working on PHP engine, for example. You slightly touch reflection, that has to be modified to support it, well either keyword or keyword modifier ,or whatever you want to call it in the end. Is there is something else that needs to be changed besides Reflection, or how does reflection, need to be changed?

André Rømcke 10:29

Reflection needs to one way or the other expose, at least this RFC, expose the fact that you have now disconnected or asynchronous read and write visibility. But other than that, I don't see a big problem for reflection because, at least, in most cases, code tend to use setAccessible(). This will work as before, basically, you will get access in the instance of the reflection property. If there's any code out there that is checking isPrivate(), isProtected(), isPublic(), they might need to check if the read or write access visibility is as they expect. I haven't never had the need to use this myself but there's probably use cases out there for code dealing with unknown objects, for instance.

Derick Rethans 11:20

And I saw in the RFC that you're suggesting to slightly change what isAccessible() does, but also add new methods to specifically check for the setability and getability. What are the names of these methods that you're wanting to add because I can't quite remember?

André Rømcke 11:37

So the names of the reflection properties suggested currently in RFC is isSetPrivate(), isSetProtected(), isSetPublic() and then similar for the gets so isGetPrivate(), isGetProtected(), and isGetPublic(). And this would be, in addition to the current isPrivate(), isProtected(), isPublic(), which would be then have to be just a tiny bit to rather affect the visibility of both, so to say. So in the case of public both actually needs to be public for this to return true. Same with protected could also be protected:public or in the case of the other syntax protected, but public set. So, a kind of a write only property, then it would be considered protected, along with the private.

Derick Rethans 12:34

So this brings me actually to point that, when seeing that your second syntax which has this set in between parenthesis, to getter didn't have anything within parenthesis. Could it make sense to have, get in parenthesis there instead of having it not marked at all?

André Rømcke 12:51

It could be, but this, this this kind of brings us to maybe our downside with the syntax. And also, one, one point that Larry was making in terms of how this would fit with accessors, because it was basically proposing to reverse it. So it would be set colon private for instance, so that the order would be more in line with what might come with accessors later. So, Nikita basically proposed that we move this to a PHP 8.1 and even though I haven't updated the RFC yet I agree. So, I expect us to do more discussions around the syntax and maybe look at this as Nikita proposed on a higher level. Look at how this different needs can be aligned in a better way.

Derick Rethans 13:37

You mentioned PHP 8.1, if that's the case, then this is the first RFC where it's discussing for for PHP 8.1, which is kind of nice because of course feature freeze is coming pretty soon now. And that's even closer to when this podcast comes out at the end of July. Is there any potential for breaking backwards compatibility with the addition of the syntax that you're proposing?

André Rømcke 13:58

There's a few things which I haven't explored yet and defined in RFC and that is with this RFC, the probably needs., well, there definitely needs to be the definition on how property exists and these kind of things should work, and also other methods that check about. There's also isset() and unset(). Needs to be defined how this should work, so I don't know currently about any clear BC breaks per se, currently I'm only aware of it's being the case once people use it. For existing code I haven't seen any BC breaks yet, but for any code that starts to use it. You will need to adapt your, your reflection code potentially, and we'll need to look into how propertyExists(), and isset() and so on will behave.

Derick Rethans 14:41

What's the feedback been so far?

André Rømcke 14:43

From others in Symfony slash CMS is part of the realm, it's kind of positive. They would very much like this kind of features to easily more granularly say how this property should be should be accessed. So that's a definite need out there. There are a few language purists maybe, or whatever we should call it, that are clearly against this and allowing this kind of pragmatic access to define whatever you want. While I kind of understand it, and to some extent agree with it, I still think there's usefulness in defining the underlying semantics, on the, on the access to the properties. Yes for immutable, it's not right there yet, you could see the future where we have, if we use the words from from C sharp, it could be public:init for instance, or something like that, whatever the syntax is to define that this is public property, but it's only writable in constructor and potentially destructor and so on. So there's definitely missing a lot here to enable the full immutability semantics. Personally, I think this could be the underlying semantics of immutability, even though that would be that maybe the keyword we expose some promote more to use for for entities and grant uses.

Derick Rethans 16:05

I don't have to ask you then when you think you'll be putting up this to vote if if you end up retargeting this for PHP 8.1, because you have a full year to go for that.

André Rømcke 16:14

To be honest, I'm not a C developer. So, even though I think I got some pointers that you for trying to debug the session code in PHP but didn't go beyond that. So, unless anyone volunteers to code the patch for this in the next few weeks I don't see us moving this to that point right now.

Derick Rethans 16:34

I mean, this could be a good starting point for exploring how to implement all the concepts that you were talking about such as immutability and readonly concepts and things like that. In that that's always we're doing it right and it's not a bad thing that takes a longer time than just the minimum discussion period of two weeks. All these complex interactions are always going to be very difficult to sort out anyway.

André Rømcke 16:56

On one side of course I would like to have this as soon as possible, but I've been waiting since 2011 so I can wait another year.

Derick Rethans 17:03

What is a year on a decade. Alright André would you have anything else to add?

André Rømcke 17:08

No. One thing to add this, I, no matter what we end up with here there is definite need to avoid the boilerplate code, dealing with magic methods and also the performance it of using those. It would be great and simpler to get into the language without this stuff.

Derick Rethans 17:25

Thank you, André for taking the time this morning to talk to me, as I said, I think this will be a good starting point for a somewhat longer discussion.

André Rømcke 17:32

Thank you as well this was a great opportunity to meet you again of course, and also great to be on your podcast. It's a nice podcast to follow.

Derick Rethans 17:43

Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool, you can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week.


PHP Internals News: Episode 62: Saner Numeric Strings

PHP Internals News: Episode 62: Saner Numeric Strings

In this episode of "PHP Internals News" I talk with George Peter Banyard (Website, Twitter, GitHub, GitLab) about an RFC that he has proposed to make PHP's numeric string handling less complicated.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:17 Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 62. Today I'm talking with George Peter Banyard about an RFC that he's proposing called saner numeric strings. Hello George, how are you this morning?

George Peter Banyard 0:36

How are you I'm doing fine. I'm George Peter Banyard. I work on PHP, and I'm currently employed by The Coding Machine to work on PHP.

Derick Rethans 0:46

I actually think I have a bug swatter from The Coding Machine, which is hilarious. Huh, I can't show you that okay of course in a podcast and not on TV. But yes, I think I got it in Paris at some point at a conference there, and it's been happily getting rid of flies in my living room. Anyway, that's not what we want to talk about here today, we want to talk about the RFC that is made, what is the problem that is RFC is hoping to address?

George Peter Banyard 1:09

PHP has the concept of numeric strings, which are strings which have like integers or floats encoded as a string. Mostly that would arrive when you have like a get request or post request and you take like the value of a form, which would be in a string. Issue is that PHP makes some kind of weird distinctions, and classifies numeric strings in three different categories mainly. So there are purely numeric strings, which are pure integers or pure floats, which can have an optional leading whitespace and no trailing whitespace.

Derick Rethans 1:44

Does that also include like exponential numbers in there?

George Peter Banyard 1:48

Yes. However trailing white spaces are not part of the numeric string specification in the PHP language. To deal with that PHP has a concept of leading numeric strings, which are strings which are numeric but like in the first few bytes, so it can be leading whitespace, integer or float, and then it can be whatever else afterwards, so it can be characters, it can be any white spaces, that will consider as a leading numeric string. The distinction is important because PHP will sometimes only accept pure numeric strings. But in some other place where we'll accept leading numeric strings. Of casts will accept whatever string possible and will try to coerce it into an integer. In weak mode, if you have a type hint. It will accept leading numeric strings, and it will emit an e_notice that a non well formed string has been encountered. When you use like a purely string string, you'll get a non numeric string encountered warning. So the main issue with that is that like strings which have a leading whitespace are considered more numeric by PHP than strings with trailing whitespaces. It is a pretty odd distinction to make.

Derick Rethans 3:01

For me to get this right, the numeric string in PHP can have whitespace at the start, and then have numbers. There's a leading numeric string that can have optional whitespace in front, numbers and digits, and then rubbish. Then there's a non numeric string which never has any numbers in it.

George Peter Banyard 3:22

No numbers in the beginning. "HelloWorld5" will be considered non numerical.

Derick Rethans 3:26

So it's a string that doesn't start with digits.

George Peter Banyard 3:29

Yes, or optional whitespace.

Derick Rethans 3:31

So there are three different numeric strings, sort of. There're two, and then one that is a string that doesn't have numbers. And you mentioned that some places. These are accepted and in other places they're not. So typecast will accept both numeric strings and leading numeric strings. Where is the leading numeric string, not accepted?

George Peter Banyard 3:53

If you use is_numeric call, it'll only return true on pure numeric strings.

Derick Rethans 4:00

And they have whitespace ain the end?

George Peter Banyard 4:02

They can only have leading white spaces. Explicit typecasting will work regardless, so even on non numeric strings, an int cast that will convert it to to zero, because that's how tight juggling works in PHP, and it will do. American leading numeric strings, it will take us to the initial leading numeric.

Derick Rethans 4:27

And stripping out leading whitespace if there's any?

George Peter Banyard 4:30

Strip stripping leading white spaces and stripping garbage out of the end if it's a just a leading numeric string. String to string comparison with the double equal comparison operator will perform a numeric compare comparison, only if both strings are numeric, purely numeric. Whenever you do a string to int, or float comparison, the string will get type juggled to an int or to a float, regardless of its numericness. So, we'll get non numeric string for get typecast into zero implicitly, and you'll get warnings, but it has some odd behaviour. In weak typing mode, so strict types disabled, an int typecast where an int type declaration for an argument. When you pass it an numeric string to it. If it's a leading numeric string, it will convert it was an E notice, and it will do a type error if it's a non numeric string. This can be a slight issue, if you for example you pass in a hash, it should be a string. As always, but it starts with like a digit, then it will get type juggled to an int. And it will pass the type declaration check and just like work with.

Derick Rethans 5:54

And you're get a notice?

George Peter Banyard 5:56

So you get a notice. Whereas like if it's, if it would be an a hash was just purely which starts with a with a character, you would get an e_warning, as in like a non well formed string like numeric string has been encountered.

Derick Rethans 6:10

That sounds quite complicated. You mentioned that there's one other place where you can use numeric strings, which is in array keys.

George Peter Banyard 6:21

Yes, array keys and string offsets. So array keys have a special semantic, which are like integer strings, which are separate concept and kind of same; as in, it needs to start with a nonzero digit, or be zero. For the zero index. It needs to be only digits, and that will be interpreted as an integer key. Otherwise, anything else will be interpreted as a string key, "5.5", which is a float like a numeric float string, will stay as "5.5" as the array key. This behaviour is different to string offsets.

Derick Rethans 7:07

So you're saying that a string with "5.5" in it, in array key stays "5.5"?

George Peter Banyard 7:15

Yes, and the same if you have a string key which is "03", you'll get a string key which is "03", it won't get evaluated as three. You can try it yourself, because it is the most weirdest behaviour, ever. I got what's quite surprised about that.

Derick Rethans 7:32

You are correct, but if it's a float it gets truncated.

George Peter Banyard 7:36

Yes, to five.

Derick Rethans 7:38

Hey, I've learned something new here, I thought that would also truncate.

George Peter Banyard 7:41

That would be kind of logical, in some sense, but it doesn't.

Derick Rethans 7:46

Continuing

George Peter Banyard 7:47

Array offsets have this behaviour, string keys have the more usual behaviour of using numerical, like numeric strings, as there can't be a string offset first, like it can only be like an integer. So that's why it's more lax, in some sense, it will use the usual semantics. However, if the numeric string is a float, or if it's a leading integer string, it'll emit the illegal string offset warning, but still used explicit int cast to cast it to an integer. "2str" would be cast to two, like a string index "foo" would be casted to zero, and "5.5" would be cast it to 5. It's all kind of confusing I wish doesn't follow other illegal offset behaviour for some sentence. If you try to pass an array as a as an offset you'll get a type error in PHP 8.

Derick Rethans 8:55

I have to admit, I am totally getting lost here. This sounds also complicated, and that something needs to be done about this. Am I correctly understanding that this is exactly what your RFC is trying to do?

George Peter Banyard 9:08

Yes, this is an attempt to bring back sanity into this whole mess.

Derick Rethans 9:13

So what are you proposing here?

George Peter Banyard 9:14

The proposal is to get rid of the concept of leading numeric strings, because it's mostly weird, and it's more confusing than it needs to be. To do that, numerical strings, will accept trailing white spaces. So numeric string which has leading whitespace won't be more numeric than a string with trailing white spaces. On top of that, all current, e_notices a non well formed numeric value encountered, will be changed to emit a non numeric value encountered e_warning. There's a promotion and severity in some sense as well. Should only affect purely non numeric strings, or leading numeric strings with have jibberish after the digit. For string offsets, numeric strings which correspond to well formed floating point numbers will emit the more usual string offset cast occurred warning, instead of the illegal string offset. Leading numeric strings which currently emit a non well formed numeric value and countered notice will emit the illegal string offset, and still continue to evaluate the previous value to ease the migration to PHP eight and for backwards compatibility. However, non numeric strings, which don't represent a number at all. Now throw in an illegal offset type error. This would affect our estimates operation on strings, so plus minus, multiplication, etc. Then float type declarations. So, in turn, float type declaration for internal and user land functions. Comparisons operator which considered that numeric strings with trailing white spaces weren't numeric, and so would produce false, say for example, the string "123 ", equal, equal to string " 123" will now produce true instead of false. The built in is_numeric function would return true for numeric strings which have trailing white spaces, where before it would emit false. And the plus plus, minus minus, increment, decrement operators would convert numeric strings with trailing white spaces to integers or floats and use the numerical increment instead of the alphanumeric would increment rules.

Derick Rethans 11:35

You say whitespace, do you just mean the space characters or does it include like tabs and returns as well?

George Peter Banyard 11:43

Tabs, new lines vertical ,spaces. Mostly what would consider white spaces.

Derick Rethans 11:48

I guess there's a horizontal tab and a vertical tab and stuff like that. What's the potential for for breaking changes here because messing around with PHP's type juggling rules is always a bit tricky. What are the BC implications here?

George Peter Banyard 12:05

I would expect most reasonable code to not be affected. It changes, one which is relatively minor, which is, if you, for some reason, your code needs the string to be numeric and only have leading white spaces, but no trailing white spaces, which is a pretty specific requirement. Then accepting trailing white spaces would break that code, because that would be considered a valid numeric string, whereas the code assumes that would be non non well formed, which is an odd requirement to have. That's why I don't expect it to be that big. Second one, more problematic one, is code which has liberal use of leading numeric strict. If for example you pass the DOM, an XML or a CSS file or something, and you get 2px, for example, for 2 pixel. And you just take that string, and dump it into various things and expect it to get two out of it. Sometimes you will need to now use an explicit cast to get the previous behaviour. That would be notified by you or by the by an e_notice in PHP 7.4, and it would it would inform you with a e_warning in PHP 8.

Derick Rethans 13:28

Considering you get a warning ish thing in both cases it's not really a BC break, I mean it's not suddenly going to start throwing an exception, which could break your code flow for example.

George Peter Banyard 13:39

Yes, and also all behaviour should be identical to PHP 7.4 and PHP 8. If there wasn't a warning before, if it was a notice, and it's been moved to a warning, the behaviour should be the same, except for like non numeric strings which sometimes will emit a type error, that's most likely a bug, were you expecting something to be an integer like and it's just pure or strict.

Derick Rethans 14:07

Oh, of course for user input, we know we shouldn't casting anyway, we should use the filter extension to get to this data, does this impact the filter extension at all?

George Peter Banyard 14:19

No, I don't think so. I don't think the filter extension uses the C is_numeric, is_numeric_string function. And it uses its own parsing of strings.

Derick Rethans 14:30

Have you gotten any feedback about this so far?

George Peter Banyard 14:33

Some feedback was to clarify some of the changes if it would affect code. Also, I had some doubt about how to handle the string offset case, which initially one of the proposals was to promote the leading number of strings to emit the warning, but also returned zero instead of returning the previous value, which would be pretty hard to detect, although they emitted a notice previously. So I've changed that again to like more in line with the behaviour, it has in PHP seven, where it just truncates the gibberish and cast it to an integer. So at least that BC concern should be removed.

Derick Rethans 15:24

As I mentioned, this is all pretty hard to wrap my head around, not because you don't explain this correctly, but mostly because it's so complicated to begin with. I would probably recommend that people that listen to this podcast episode would also have a look at the RFC, because it will come with examples in the cases as well, and sometimes just looking at the examples is a lot easier than listening to the exact descriptions of strengths as parsed by the PHP engine.

George Peter Banyard 15:53

Yes, which, at time can be mostly weird and nonsensical, but mostly based on Perl semantics.

Derick Rethans 16:02

Sometimes we steal from Java, sometimes we steal from Rust, and sometimes some Perl it seems them. And there's nothing wrong with that.

George Peter Banyard 16:10

There's nothing wrong, and in some sense, if you steal all the good things you get a better language, and sometimes you make some slight mistakes along the way.

Derick Rethans 16:19

let me not start about the @@ operator. We'll keep that for another episode, maybe.

George Peter Banyard 16:25

Yes.

Derick Rethans 16:26

When do you think you're going to put this up for a vote?

George Peter Banyard 16:29

So I started the discussion early this week. So on the 29th of June. I would expect the two weeks discussion period, because feature freezes coming up pretty soon. It needs to be voted on before and implemented into core before that. Voting should start on the 13th of July for two weeks until the 27th, which would give like another week to land stuff; to land it into core and tweak the implementation details.

Derick Rethans 16:59

I'm expecting a lot more RFCs just wanting to get in, just before the deadline.

George Peter Banyard 17:05

I suppose so, it's also kind of difficult because getting really tight.

Derick Rethans 17:09

Okay, George. Thanks for this. Would you have anything else to add?

George Peter Banyard 17:13

No, thanks for having me on the show again Derick, and I hope you have a nice evening.

Derick Rethans 17:17

Thanks very much.

Thanks for listening to this installment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week.


PHP Internals News: Episode 61: Stable Sorting

PHP Internals News: Episode 61: Stable Sorting

In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about his Stable Sorting RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:18

Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 61. Today I'm talking with Nikita Popov about a rather small RFC that he's proposing called stable sorting. Hello Nikita, how are you this morning?

Nikita 0:36

Hey, Derick, I'm great. How are you?

Derick Rethans 0:38

Not too bad myself. Let's jump straight in here. The title of the RFC is stable sorting, what does that mean, what is stable sorting, or what is sorting stability?

Nikita 0:48

Sorting stability refers to the behaviour of the sort when it comes to equal elements. And equal share means that we sort comparison function. For example, the one you pass to usort says the elements are equal, but there is still some way to distinguish them. For example, if you're sorting some objects, to take the example from the RFC, we have an array with users, and users have an age, and we use usort to only sort the users by age. Then according to the comparison callback all users with the same age are equal. But of course, the user also has other fields on which we can distinguish it. And the question is now in what order will equal elements appear. If we have a stable sort, then they will appear in the order they were originally in. So it's something not going to change.

Derick Rethans 1:41

And that is not what PHP sorting mechanism currently does?

Nikita 1:44

Right. PHP currently uses an unstable sort, which means that the order is simply unspecified. It will be deterministic. I mean if you take the same input array and sort it, then every time we will get the same result. But there is no well specified order or relative order of elements. There's just some order. The reason why we have this behaviour is that well there are, I would say, two, the only two sorting algorithms. There is merge sort. Which is a guaranteed n log n sort that the stable, but has the disadvantage that that requires additional memory to perform the merge step. The other side there is a quicksort, which is an average case n log n sorting algorithm and is unstable, but does not require any additional memory. And in practice, everyone uses one of these algorithms, usually with a couple of extensions on sort of merge sort. Nowadays we use timsort, but which is still based on the same underlying principle, and for quicksort, we have sort which is better than quicksort, which tries to avoid some of the bad worst case performance which quicksort can have. PHP currently uses us a quicksort, which means that our sorting results are unstable.

Derick Rethans 3:07

Okay, and this RFC suggesting to change that. How would you do that? How would you modify quicksort to make it stable?

Nikita 3:15

Two ways. One is to just change the sorting algorithm. So as I mentioned, the really popular stable sorting is timsort, which is used by Python by Java and probably lots of other languages at this point. And the other possibility is to stick with an unstable source. So to stick with quicksort, but to artificially enforce that the comparison function does not have, does not report equal elements that are not really equal. And we can do that by introducing an extra artificial fallback comparison. We remember the order of the elements in the original array. And as the comparison function tells us that elements are equal. You will check against this original order, which means that, okay are sort of still unstable, but because the comparison, we'll never actually report that two elements are equal unless they really equal. It doesn't matter for the result.

Derick Rethans 4:16

So you're basically artificially changing the key to have the original index in the array.

Nikita 4:24

That's pretty much exactly the implementation. And this is actually also how you would implement the stable sort if you'd do it in PHP code. So you would take your array and convert it into an array of pairs, where you have the original array value and the original position of the array element. Difference is just that if you do this in PHP code this is extremely extremely inefficient, in terms of memory and performance, while when we do it internally it's essentially free because we already have a little bit of unused space in each array element. We can easily store the current position.

Derick Rethans 5:02

Do you think there will be much of a performance hit here?

Nikita 5:04

So I expect that there is a bit of performance hit, but for typical usage, not much. For the good case where your array does not actually contain any equal elements, the overhead should be very small, something like maybe one or 2%,. If your array does contain a huge number of duplicates. Then there is more overhead, and the effect is basically that the sort performance, no longer depends on the number of duplicates you have. Previously if you had a lot of duplicates, then the sort became faster, the more duplicates you had. Well now, as you add more duplicates, the sorting performance will stay both stable. That's really the difference in performance.

Derick Rethans 5:53

If you have the numbers in the RFC I'll make sure to link to them. There are possibility that is that this is going to break any code?

Nikita 6:01

Yes, it could break tests.

Derick Rethans 6:04

Tests, because the test's output can change because the sorting order of arrays might have changed.

Nikita 6:11

Exactly. So we already had such a change in PHP seven, where we switched from a pure quicksort, to a hybrid quicksort and insertion sort, which means that effectively we have a stable source for arrays smaller than 16 elements and an unstable source for larger arrays, which is weird, weird intermediate state.

Derick Rethans 6:33

Yes.

Nikita 6:35

I think that one already had quite a bit of fallout for testing purposes. Hopefully this one will be a little bit smaller because most tests will work on a few elements. Those would have already been stable previously. But there is definitely going to be a little bit of fallout for unit testing.

Derick Rethans 6:56

At the moment we're talking about this, the RFC's already up for voting. By the time this podcast has come out. It's pretty likely that it has been accepted for PHP eight, because I think the voting was 51 to zero or something like this.

Nikita 7:10

It's 36 to zero.

Derick Rethans 7:13

There you go. Thank you, Nikita for taking the time this morning to talk to me about stable sorting.

Nikita 7:19

Thanks for having me.

Derick Rethans 7:23

Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week.


PHP Internals News: Episode 60: OpenSSL CMS Support

PHP Internals News: Episode 60: OpenSSL CMS Support

In this episode of "PHP Internals News" I chat with Eliot Lear (Twitter, GitHub, Website) about OpenSSL CMS support, which he has contributed to PHP.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 60. Today I'm talking with Eliot Lear about adding OpenSSL CMS supports to PHP. Hello Eliot, would you please introduce yourself.

Eliot Lear 0:34

Hi Derick, it's great to be here. My name is Eliot Lear, I'm a principal engineer for Cisco Systems working on IoT security.

Derick Rethans 0:41

I saw somewhere on the internet, Wikipedia I believe that he also did some RFCs, not PHP RFC, but internet RFCs.

Eliot Lear 0:49

That's correct. I have a few out there I'm a jack of all trades But Master of None.

Derick Rethans 0:53

The one that piqued my interest was the one for the timezone database, because I added timezone support to PHP a long long time ago.

Eliot Lear 1:01

That's right, there's a whole funny story about that RFC, we will have to save it for another time but there are a lot of heroes out there in the volunteer world, who keep that database up to date, and currently the they're corralled and coordinated by a lovely gentleman by the name of Paul Eggert and if you're not a member of that community it's really a wonderful contribution to make, and they need people all around the world to send an information but I guess that's not why we're here today.

Derick Rethans 1:29

But I'm happy to chat about that at some other point in the future. Now today we're talking about CMS support in OpenSSL and the first time I saw CMS. I don't think that means content management system here.

Eliot Lear 1:41

No, it stands for cryptographic message syntax, and it is the follow on to earlier work which people will know as PKCS#7. So it's a way in which one can transmit and receive encrypted information or just signed information.

Derick Rethans 1:58

How does CMS, and PKCS#7 differ from each other.

Eliot Lear 2:03

Actually not too many differences, the externally the envelope or the structure of the message is slightly better formed, and the people who worked on that at the Internet Engineering Task Force were essentially just making incremental improvements to make sure that there was good interoperability, good for email support and encrypted email, and signed email, and for other purposes as well. So it's very relatively modest but important improvements, from PKCS#7.

Derick Rethans 2:39

How old are these two standards?

Eliot Lear 2:42

Goodness. PKCS#7, I'm not sure actually of how old the PKCS#7 is, but CMS dates back. Gosh, probably a decade or so I'd have to go look. I'm sorry if I don't have the answer to that one,

Derick Rethans 2:56

A ballpark figure works fine for me. Why would you want to use CMS over the older PKCS#7?

Eliot Lear 3:02

You know, truthfully, I'm not, I'm not a cryptographer, so the reason I used it was because it was the latest and greatest thing and when you're doing this sort of work. I'm an, I'm an interdisciplinary person so what I do is I go find the experts and they tell me what to use. And believe it or not, I went and found the person who's the expert on cryptographic signatures, which is what I need. I said: What should I use? He said: You should use CMS and so that's what I did. What I ran into some troubles though, which is that some of the tooling, doesn't support CMS. So, in particular PHP didn't support CMS. So that's why I got involved in the PHP project.

Derick Rethans 3:40

You are a new contributor to the PHP project. What did you think of its interactions?

Eliot Lear 3:45

I had a wonderful time doing the development. There was a fair amount of coding involved, and one has to understand that the underlying code here is OpenSSL and OpenSSL's documentation for some of its interfaces could stand a little bit of improvement. I needed to do a fair amount of work and I needed a fair amount of review so I got a lot of support from Jakub particular, who looks after the OpenSSL code base, as one of the maintainers, and I really enjoyed the CI/CD integration, which allowed me to check the numerous environments that PHP runs on. I really enjoyed the community review, and I really enjoyed it even though I didn't have to really do one in my case, I did do an RFC, as part of the PHP development process, which essentially forced me to write really good documentation or at least I hope it's really good. Before all of the caller interfaces that I defined, so it was a really enjoyable experience. I really liked working with the team.

Derick Rethans 4:47

That's good to hear. I think sometimes although an RFC wasn't particularly necessary here, as an RFC one particularly necessary I always find writing down the requirements that I have for my own software, first, even though this doesn't get publicized or nobody's going to review that always very useful to just clear my head and see what's going on there.

Eliot Lear 5:06

Yeah, I think that's a good approach.

Derick Rethans 5:07

During the review, was there a lot of feedback where you weren't quite sure, or what was the best feedback that you got during this process?

Eliot Lear 5:15

Biggest issue that we had was, how to handle streaming, and we have some code in there now for streaming, but it's it's unlikely to get really heavily exercised in the way that the interfaces are defined right now. It's essentially files in/files out interface which mirrors the PKCS#7 interface. One of the future activities that I would like to take on if I can find a little bit more time, is to move away from the files in/files out interface, but rather use an in memory structure or in memory interface. So that can actually take advantage of streaming and can be more memory efficient, over time.

Derick Rethans 5:56

When you say file now you actually provide a file name to the functions?

Eliot Lear 6:00

That's right, you know, depending on which of the interfaces you're using, there's an encrypt, there's an encrypt call there's a decrypt call. There's a sign and a validate call, and or a verify call, and each of them has a slightly different interface, but you know if you're encrypting you need to have the destination that you're encrypting through these are all public key, you know PKI based approaches so you have to have the destination certificates, that you're sending. If you're verifying you need to have the private key to do or you need, I'm sorry you need to have the public key chain and if you're decrypting to have the private key to do all this. So, but they're all filenames that are passed and it's a bit of a limitation of the original interface in that you probably don't really want to be passing file names from most of your functions you'd rather be passing objects that are a bit better structure than that.

Derick Rethans 6:53

Is the underlying OpenSSL interface similar or does that allow for streaming in general?

Eliot Lear 6:59

The C API allows for streaming in such. The command line interface, it doesn't seem to me that they do any particular things with with streaming. If you look at the cryptographic interface that we that we did for CMS, mostly it is an attempt to provide the capability that you would otherwise have on the open using the OpenSSL command line interface and I think the nice thing here is that we can evolve from that point.

Derick Rethans 7:26

And the progress wouldn't only be done implemented for the CMS mechanism, but also for PKCS#7, as well as others that are also available.

Eliot Lear 7:35

Yes. Another area that I would like to look at, I'm not sure how easy it will be, we didn't try it this time was to try and combine the code bases because they are so close, and be a little bit more code efficient, but there are just slight enough differences in the caller interfaces between PKCS#7 and CMS that, I'm not sure I could get away with using void functions for everything I have. I might have to have a lot of switches, or conditionals in the code. But what I am interested in doing for both sets of code is, again, providing new interfaces, where instead of passing file names, you're passing memory structures of some form that can be used to stream. That's the future.

Derick Rethans 8:22

I've been writing quite a bit of GO code in the last couple of months. And that interface is exactly the same, you provide file names to it, which I find kind of annoying because I'm going to have to distribute these binaries at some point. And I don't really want any other dependencies in the form of files, so I need to figure out a way how to do that without also provide those key files at some point.

Eliot Lear 8:43

Indeed, that's, that's an issue, and for us right well who are web developers I did this because I was doing some web development. A lot of the stuff that I want to do. I just want to do in memory and then pass right back to the client and I don't really want to have to go to the File System. And right now, I'll have to take an extra step to go to the File System and that's alright, it's not a big deal, but it'll be a little bit more elegant when I get away from that. We'll do that you know at an appropriate time.

Derick Rethans 9:11

Yes, that sounds lovely. I'm not an expert in cryptography either. I saw that the RFC mentions the X 509. How does it tie in with CMS and PKCS #7?

Eliot Lear 9:21

X 509 is essentially a certificate standard. In fact, that's what really what it is. A certificate essentially has a bunch of attributes, along with a subject being one of those attributes and a signature on top of the whole structure. And the signature comes from a signer, and the signer is essentially asserting all of these attributes on behalf of whoever sent the request. X 509 certificates are, for example the core of our web authentication infrastructure. When you go to the bank online, it uses an X 509 certificate to prove to you that it is the bank that you intended to visit, that's the basis of this and CMS and PKCS#7 are structures that allow the X 509 standard to be serialized, so there's the distinguishing coding rules that are used underneath PKCS#7 and CMS, and then what you have, CMS essentially was designed as at least in part for mail transmission. So how is it that you indicate the certificate, the subject name, the content of the message. All of this information had to be formally described, and it had to be done in a way that is scalable. And the nice thing about X 509, as compared to say just using naked public keys, is with naked public keys, the verifier or the recipient has to have each individual public key, whereas with X 509, it uses the certificate hierarchy such that you only need to have the top of the chain, if you will, in order to validate a certificate. So X 509 scales, amazingly well, we see that success, all throughout the web. And so that's what CMS and PKCS#7 help support.

Derick Rethans 11:24

Like I said, I've never really done enough research into this but I think it is something that many web developers should really know how that works because this comes back, not only with mail, but also with HTTPS.

Eliot Lear 11:35

It's another part of the code right. So CMS isn't directly used for supporting TLS connections, there's a whole a whole set of code inside of PHP for that.

Derick Rethans 11:44

Would you have anything else to add?

Eliot Lear 11:46

I would say a couple of things. The basis of this work was that I was attempting to create signatures for something called manufacturer usage descriptions. The reason I got involved with PHP is that I'm doing tooling that supports an IoT protection project. And this this manufacturer usage descriptions essentially describes what the device, what an IoT device needs in terms of network access. And the purpose of using PHP and adding the code that I added was so that those descriptions could be signed, and that's why Cisco, my employer, supported my activity. Now Cisco loves giving back to the community. This was one way we could do so it's something I'm very proud of when it comes to our company. And so we're very happy to participate with the PHP project. I really enjoyed working with

Derick Rethans 12:33

That's glad to hear. I'm looking forward to some other API improvements because I agree that the interfaces that the OpenSSL extension has aren't always the easiest to use and I think it's important that encryption is easy to use, because more people will use it right.

Eliot Lear 12:49

I have to say, in my opinion, the encryption interfaces that we have today are still relatively immature. And not just CMS, the code that I wrote, which is really you know fresh it just got committed, but the whole category of interfaces, is something that will evolve over time and it's important that it do so because the threats are evolving over time and people need to be able to use these interfaces, and we can't all be cryptographic experts, I'm not. I just use the code but I needed to write some in order to use it in my case, but as we go on I think will enjoy richer and easier to use interfaces that normal developers can use without being experts.

Derick Rethans 13:38

PHP has been going that way already a little bit because we started having a simple random interface, and in a simple way of doing hashes and verifying hashes, to make these things a lot easier because we saw that lots of people are implementing their own ways in PHP code, and pretty much messing it up because, as you say not everybody's a cryptographer.

Eliot Lear 13:56

That's right. And so that's a really good thing that PHP did, because as you pointed out, it eliminates all the people who are going onto the net looking for the little snippet of code that they're going to include in PHP, whether that snippet is correct or not that's a big issue.

Derick Rethans 14:11

Absolutely. And cryptography is not something that you want to get wrong.

Eliot Lear 14:15

That's right, because for every line of code that you've written in this space, there's going to be somebody who's going to want to attack it, maybe several.

Derick Rethans 14:23

Absolutely. Thank you, Eliot, for taking the time this morning to talk to me about CMS support.

Eliot Lear 14:28

It's been my pleasure Derick, and thanks for having me on. And again, it was really enjoyable to work with the PHP team and I'm looking forward to doing more.

Derick Rethans 14:38

Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool, you can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week.


PHP 8: A Quick Look at JIT

PHP 8: A Quick Look at JIT

Following on from a PHP 8/JIT benchmark on twitter, I decided to have a look myself.

I've picked an example that I know speeds up really well when reimplementing it in C. I wrote about this RDP algorithm some time ago.

What it does is to take a line of geospatial points (lon/lat coordinates), and simplifies it. It's my go-to example to show raw algorithmic performance, which is probably the best place to use a JIT for non-trivial code. I actually use this in production.

With PHP 7.4:

$ pe 7.4dev; time php -n \
        -dzend_extension=opcache -dopcache.enable=1 -dopcache.enable_cli=1 \
        -dopcache.jit=1235 -dopcache.jit_buffer_size=64M \
        bench-rdp.php 1000
Using array (
  0 => 'RDP',
  1 => 'simplify',
)

real    0m8.778s
user    0m8.630s
sys     0m0.117s

(I realise that the opcache arguments do nothing on the command line here). This runs RDP::simplify (my PHP implementation) 1000 times in about 8 seconds.

With PHP 8.0 and JIT:

$ pe trunk; time php -n \
        -dzend_extension=opcache -dopcache.enable=1 -dopcache.enable_cli=1 \
        -dopcache.jit=1235 -dopcache.jit_buffer_size=64M \
        bench-rdp.php 1000
Using array (
  0 => 'RDP',
  1 => 'simplify',
)

real    0m4.640s
user    0m4.627s
sys     0m0.008s

It jumps from ~8.8s to ~4.6s, a reduction in time of ~4.2s (or 48%), which is pretty good.

Now if I run the same with the geospatial extension which has a C implementation.

With PHP 7.4 and the extension:

$ pe 7.4dev; time php -n -dextension=geospatial \
        -dzend_extension=opcache -dopcache.enable=1 -dopcache.enable_cli=1 \
        -dopcache.jit=1235 -dopcache.jit_buffer_size=64M bench-rdp.php 1000
Using 'rdp_simplify'

real    0m0.695s
user    0m0.675s
sys     0m0.021s

Which gives a reduction in speed compared to PHP 7.4 of ~8.1s (or 92%).

So it looks like the JIT does do some good work for something that's highly optimisable, but still nowhere near what an implementation in C could do.

The code that I used is in this Gist.

This ran on a 4th gen ThinkPad X1 Carbon, making sure my CPU was pinned at its maximum speed of 3.3Ghz. Although I've pasted only one result for each, I did run them several times with very close outcomes.


PHP Internals News: Episode 59: Named Arguments

PHP Internals News: Episode 59: Named Arguments

In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about his Named Parameter RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:18

Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 59. Today I'm talking with Nikita Popov about a few RFCs that he's produced. Hello Nikita, how are you this morning?

Nikita Popov 0:35

Hey Derick, I'm great. How are you?

Derick Rethans 0:38

Not too bad, not too bad today. I think I made a decision to stop asking you to introduce yourself because we've done this so many times now. We have quite a few things to go through today. So let's start with the bigger one, which is the named arguments RFC. We have in PHP eight already seen quite a few changes to how PHP deals with set up and things like that we have had an argument promotion in constructors, we have the mixed type, we have union types, and now named arguments, I suppose built on top of that, again, so what are named arguments?

Nikita Popov 1:07

Currently, if you're calling a function or a method you have to pass the arguments in a certain order. So in the same order in which they were declared in the function, or method declaration. And what named arguments or parameters allows you to do is to instead specify the argument names, when doing the call. Just taking the first example from the RFC, we have the array_fill function, and the array_fill function accepts three arguments. So you can call like array_fill( 0, 100, 50 ). Now, like what what does that actually mean? This function signature is not really great because you can't really tell what the meaning of this parameter is and, in which order you should be passing them. So with named parameters, the same call would be is something like: array_fill, where the start index is zero, the number is 100, and the value is 50. And that should immediately make this call, like much more understandable, because you know what the arguments mean. And this is really one of the main like motivations or benefits of having named parameters.

Derick Rethans 2:20

Of course developers that use an IDE already have this information available through an IDE. But of course named arguments will also start working for people that don't have, or don't want to use an IDE at that moment.

Nikita Popov 2:31

At least in PhpStorm, there is a feature where you can enable these argument labels for constants typically only. This would basically move this particular information into the language, but I should say that of course this is not the only advantage of having named parameters. So making code more self documenting is one aspect, but there are a couple couple more of them. I think one important one is that you can skip default values. So if you have a function that has many optional arguments, and you only want to say change the last one, then right now you actually have to pass all the arguments before the last one as well and you have to know: Well, what is the correct default value to pass there, even though you don't really care about it.

Derick Rethans 3:19

If I remember correctly, there are a few functions in PHP's standard library, where you cannot actually replicate the default value with specifying an argument value, because they have this really complex and weird kind of behaviour.

Nikita Popov 3:33

That's true, but that's something we're trying to eliminate in PHP eight mostly.

Derick Rethans 3:39

And of course additional you'd never have to remember, whether in_array and array_search have needle or haystack first, which is also beneficial.

Nikita Popov 3:46

That's true. Yeah.

Derick Rethans 3:48

You mentioned that there are a few other benefits as well. You mentioned self documenting and the skipping of arguments, what other benefits are there?

Nikita Popov 3:54

The other part is that you can also reorder the parameters. So this varies a little bit by language. In some languages you're required to still pass the arguments in the same order. They were declared, even if you're using name parameters. But for the purposes of PHP, you would allow passing them in arbitrary order. Just like you said you don't have to remember if the haystack is first, or the needle comes first. And I think one case where all of these benefits, play together particularly well, is when it comes to object construction. So you already mentioned that we have the constructor promotion RFC in PHP eight, which makes it pretty simple to declare value objects. So you just list all the available properties and their default values and types, the constructor and you're done. But when you actually instantiate the object, you still have to, their ergonomics are not particularly good, because you have to remember in which order you have to pass the parameters, don't really know which parameters which just looking at the call. And once again, you have to specify everything and you can't just skip a few of them with default values. And if you have like a constructor with maybe five or six arguments coming in, which is maybe unusual for normal methods, but I think somewhat normal for constructors in particular, then the current development experience there is just not very nice. And named parameters would essentially provide us something akin to an object initialization syntax which is available in many other languages, and which has also been proposed for PHP, previously. But you would get this just as a side effect of combining constructors and named parameters, without having to define any kind of special semantics for how object construction works, and how initializer syntax interacts with constructors and so on.

Derick Rethans 5:55

That ties in again with the object ergonomics that I spoke about with Larry earlier this season as well.

Nikita Popov 6:01

Yeah, I believe that this combination of ,constructor promotion and named parameters for constructors was one of the things.

Derick Rethans 6:10

We've spoken a little bit about what it is. Now, how would you use this in PHP, what is the syntax for that you're proposing?

Nikita Popov 6:18

I mean syntax is always bike shedding question. The particular one, I am proposing for now is to save the parameter name as literal, so no dollar in front of it or something. And the colon and the value you want to pass.

Derick Rethans 6:35

Is there any precedence for this syntax already, either in PHP or outside of PHP?

Nikita Popov 6:41

In PHP, not really. I mean, PHP, we usually use the double arrow to have any kind of key value mapping. This is sort of key value mapping. In other languages, yes the syntax does exist. I'm actually not sure which languages exactly use it. Probably C sharp and Kotlin. Python uses just an equal sign. Well, there are a couple who use it. I actually initially use the double arrow syntax because it's more familiar with PHP, but I found that it's, there's not really read as nicely. And I also have some ideas on how we can, like, integrate this colon syntax, into the language in a more consistent way.

Derick Rethans 7:27

I think I saw in the RFC that the only said the only way how you can do the keys is by literal and not by a variable.

Nikita Popov 7:34

That's right. This is mainly just to avoid confusion. Well if you allow specifying a variable, then the question is, well, is this variable just the parameter name? Because I mean the signature, you also write this as a variable, or is it the variable that contains the parameter name like variable variables in PHP. So I think to sidestep that confusion, we just allow identifiers, but you can still use a variable parameter names from the argument unpacking syntax.

Derick Rethans 8:04

How does that work?

Nikita Popov 8:05

So PHP supports the three dots, the ellipsis operator, both in the function declaration, and for function calls. The declaration that just means collect all the trailing arguments. And the call, at the call, means that you get an array, and the elements of this array should be interpreted as function arguments. And parameters extend that by also allowing array keys. And if you unpack an array with string keys then those will be interpreted as parameter names, and we'll use the usual named parameters passing semantics.

Derick Rethans 8:47

Interesting. I actually missed that, while reading the RFC. To be fair, I skimmed it, not really read tit. Yeah it's good to see that actually. Now people currently use positional arguments and not named arguments. How would these two interact.

Nikita Popov 9:01

Mostly, the named parameters are just syntax for positional arguments, so we perform an internal transformation to convert named parameters into positional parameters. As far as both the engine is concerned and the callee is concerned. They don't really know about parameters that's all. They see usual positional call where all the missing arguments have been filled in with default values. I think the only part to watch out for there is exactly this case of variadics, because previously, the variadic parameter could only contain a list of arguments, and now it can also have string keys, or like left over named parameters. So which did not have a matching argument in the function signature so both will now get collected to the variadic parameter. Think that's like the only case where I know that the calling convention really changes for the recipient of the arguments.

Derick Rethans 10:02

Because otherwise got a normal array they now get a bunch of things with potentially having keys in there as well. What would happen if I specify a named argument by name and also include it into the variadics?

Nikita Popov 10:15

So generally the rule is always you can pass a parameter at most once you can have the situation where you first pass some positional arguments, and then you pass named arguments. If you do that this named argument cannot clash with the previous past positional argument, if you run in this kind of situation we will always throw an exception at that point. So you're not allowed to overwrite the previous argument, or something like that.

Derick Rethans 10:42

Same would work that if a method would collect named arguments and also have the variadics array. In case you specify more arguments then the function would take. And, in the variadics you'd have that name again that would have already clashed before it even gets turned into variadic. Are the names that she gives to named arguments are case sensitive or case insensitive?

Nikita Popov 11:04

They are case sensitive. Because the parameters you specify in the function are just variables and variables in PHP are case sensitive as well.

Derick Rethans 11:14

At the moment if you inherit a method in a inheriting class, then it doesn't particularly matter what the names of these method arguments are. When you get now named arguments, is this going to change, because at the moment PHP doesn't enforce that the names of inheriting methods are of course clashing, or the same as the ones that are overriding in the parent class?

Nikita Popov 11:37

This is one of the bigger open questions we have. The problem is that if you call a method with the names from the parent class, and the child class change them, then you'll get an error because this named parameter just doesn't exist in the child class. And there are a couple of ways to approach that one is to forbid during inheritance, any kind of parameter name changes, which would be a fairly significant backwards break because well, it never mattered in the past and based on some cursory analysis, this is like parameter name changes, somewhat common in code right now. The other possibility is to just ignore this issue, expect that a lot of code is never going to use name parameters. So using the parameters only makes sense with some types of methods. If you have a method that only accepts one argument can be pretty sure that no one's going to call it that has a name parameter, and there is the option of just ignoring this issue and fixing it as it comes up, more or less. Which is maybe not the most principled approach. But if we look at other languages that do make heavy use of parameters for example like Python. And we see that they also just ignore the problem. So it looks like in practice this does work out. Of course, a significant difference there is that Python has had in parameters for a long time already. We will be retrofitting them on an old language. So the situation is somewhat different and probably rather than more dangerous for us.

Derick Rethans 13:14

This is something of course that static analysis tools can check for quite easily and I would argue that they probably should start doing that as well.

Nikita Popov 13:22

This this right, so this is both something easy to check for, and also easy to automatically fix.

Derick Rethans 13:28

Except that you need to choose which one is the correct name, of course.

Nikita Popov 13:32

Yeah, that's right.

But there is one more possibility, which is to allow the parameter names from both the parent method, and the child method. This will be like more or less a transparent way to fix that issue. The only problem you can run into this if both the parent method and the child method use the same parameter name but in a different position. If we would go with this option then we say that only in this particular case where parameter name is reused but different position that would become an inheritance error.

Derick Rethans 14:04

I quite like that actually, because that's a pragmatic approach isn't it?

Nikita Popov 14:07

I also quite like it, maybe it's just technically a bit problematic.

Derick Rethans 14:11

I can already imagine that if this gets accepted for PHP eight, which of course not sure at the moment, that Xdebug is going to have to show the variadics already with the names array elements which of course it doesn't do yet because it has no notion of. But that's good to know to have a heads up on these things.

PHP eight has already seen quite a lot of work for internal methods to get their names properly, recorded as well, so that types of stubs that you have already been working on. How does named arguments tie in with this?

Nikita Popov 14:38

The actual named arguments proposal is already pretty old. It dates back to PHP 5.6, I think, and one of the open questions since then was how we handle internal control functions, because they don't really have a notion of default values. We have optional parameters, but the default value is not known to the engine, it's only known to the implementation. There are kind of ways to work around that. They are not really safe, so they will work for most functions, but for some which who like argument context, we might end up just crashing if this function is used with named parameters and particularly weird way. One of the nice things in PHP eight is that thanks to the stub effort we actually have default values for functions available as collectible meta data so it's available for reflection, and we will would also be able to use this for named parameters. If an internal function parameter has been skipped, we can essentially fetch it from reflection and fill in the value, the same way we would do for for normal user functions. The issue there is that this only works if there are stubs available. This works for all of our internal functions. I mean, not internal but bundled functions for PHP, but it will not work out of the box with old extensions. So it will mostly work, just this kind of parameter skipping is not going to work. So it will give you an error like okay we don't have default information for this function so you can't call it like this.

Derick Rethans 16:17

There's this common myth saying that reflection is actually a very slow thing, you should never use this in your code. Is this going to be a concern for using reflection information this way for internal functions?

Nikita Popov 16:29

Well, I mean the self like you will be directly using reflection, but internal API's that do the same thing. There is a performance concern here because we store the default values, not as values but as strings. So, in the worst case we actually have to parse those strings, convert them into a syntax tree, validate the syntax tree. That's all. That's of course slow, but it's not like we can't add a bit of caching in there to make sure this only happens once, at which point the problem should be avoided.

Derick Rethans 17:02

Especially when you use things like opcache.

Nikita Popov 17:04

I should say that I do expect name parameter calls to be generally slower than positional calls, so maybe in super performance critical code you would stick with the positional arguments.

Derick Rethans 17:16

I mean it would work perfectly well so far object construction still right?

Nikita Popov 17:19

For object construction the real cost is really in the object allocations so and so.

Derick Rethans 17:24

With the introduction of named arguments aren't going to be any BC breaks, potentially?

Nikita Popov 17:29

There are not going to be any direct BC breaks, but there are of course some concerns. The first one is the change I mentioned about the variadics. That variadics can now have string keys. But I should clarify what I mean by: no, no, BC breaks. If you don't use named arguments than nothing is going to break. But of course, if named arguments are used with code that did not expect them, then we can run into some issues. So that's one of the issues. And the other one is more of a like long term maintenance concern that if we introduce named parameters, then those parameters become significant to the API, which means you cannot rename parameter names in minor versions of a library if you're semver compatible. Because, you might be breaking some codes on using those parameter names. And I think one of the biggest concerns that has come up in the discussion is that this is a significant increase in the API burden for open source libraries.

Derick Rethans 18:34

Because now suddenly, they have to think about the names of the arguments to all their methods as well, right.

Nikita Popov 18:39

So I think, like, the merits of this proposal, mostly comes down to how much additional burden does this impose on people maintaining libraries versus how much like ergonomics improvements that we get out of the feature for everyone else. One more thing to consider is that named parameters really change how you design APIs or what APIs you can reasonably design. So right now if you have a method with, for example, three boolean arguments, that would be like a really horrible method, because you call it like, true, true, false, like what does this mean? If you have name parameters, and you have the same three boolean arguments, then it's not really a problem any more. So you can, of course, you say, what the argument means and you can leave out arguments that are that you don't want to modify.

Derick Rethans 19:30

You mentioned that this RFC is quite old already. Do you think this will make it into PHP eight, as we're getting closer and closer to feature freeze, we're not quite there yet we have another month or so to go. Do you think it's ready enough to throw to the lions, so to speak?

Nikita Popov 19:46

So I think I will at least give it a try, because I do think that PHP eight is a good target for such a change. Even though it nominally does not break backwards compatibility, it does have a very significant impact in practice, so it wouldn't be good to put this on a major version. And additionally, we also did all this work on stubs in PHP eight with this it'll also fits in very well. Oh, and finally, one thing I didn't mention before is that we get attributes in PHP eight. And attributes, firstly, replace the existing Doctrine annocation system, which already supports named parameter.

For all the code that is now going to migrate from Doctrine Annotations to PHP Attributes, it would be helpful if we had named parameters, because it would make the migration a lot more straightforward, because you don't also have to change the meaning of the arguments at the same time.

Derick Rethans 20:51

I'm curious to see what the reception of this will be, especially when it is going to be voted for.

Nikita Popov 20:57

Yeah me as well. I never did get this to voting, the last time around, but we should at least get a vote this time and well if it doesn't go through then there is always next time.

Derick Rethans 21:10

there's always next time yes. Okay Nikita Thank you for taking the time this morning to talk to me about named arguments.

Nikita Popov 21:17

Thanks for having me Derick.

Derick Rethans 21:20

Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week.


PHP Internals News: Episode 58: Non-Capturing Catches

PHP Internals News: Episode 58: Non-Capturing Catches

In this episode of "PHP Internals News" I chat with Max Semenik (GitHub) about the Non-Capturing Catches RFC that he's worked on, and that's been accepted for PHP 8, as well as about bundling, or not, of extensions.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:18

Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 58. Today I'm talking with Max Semenik about an RFC that is proposed called non capturing catches. Hello Max, would you please introduce yourself.

Max Semenik 0:38

Hi Derick. I'm an open source developer, working mostly on MediaWiki. So that's how I came to be interested in contributing to PHP.

Derick Rethans 0:50

Have you been working with MediaWiki for a long time?

Max Semenik 0:53

Something like 11 years, I guess.

Derick Rethans 0:56

That sounds like a long time to me. The RFC that you've made. What is the problem that is trying to address?

Max Semenik 1:03

In current PHP, you have to specify a variable for exceptions you catch, even if I you don't need to use this variable in your code, and I'm proposing to change it to allow people to just specify an exception type.

Derick Rethans 1:20

At the moment, the way how you catch an exception is by using catch, opening parenthesis, exception class, variable, and you're saying that you don't have to do the name of the variable any more. I get that right?

Max Semenik 1:33

Yes.

Derick Rethans 1:34

Is that pretty much the only change that this is making?

Max Semenik 1:38

Yes, it's a very small, and well defined RFC. I just wanted to do something small, as my start to contributing to PHP.

Derick Rethans 1:51

I'm reading the RFC, it states also that the what used to be an earlier RFC. How does that differ from the one that you've proposed?

Max Semenik 2:00

The previous RFC wanted to also permit a blanket catching of exceptions, as in anything. And that's all, which, understandably, has caused some objections from the PHP community. While most people commented positively on the part that I'm proposing now. Or should I say really propose because the RFC, passed and was merged yesterday.

Derick Rethans 2:35

I had forgotten about it actually, it's good that you reminded me. So yeah, it got merged and ready for PHP eight. Basically what you say you picked the non controversial parts of an early RFC?

Max Semenik 2:47

I actually chose something to contribute and then looked for an RFC, to see if it was discussed previously.

Derick Rethans 2:55

Oh, I see. So, your primary idea of wanting to contribute to PHP, instead of you having an itch that you wanted to scratch, it's like you're saying?

Max Semenik 3:04

I have way larger itches that I will scratch later when I will learn how to work with PHP's code base which, which is really huge.

Derick Rethans 3:16

That makes some sense I suppose. When looking at the vote for the RFC I actually couldn't see that you had voted it for yourself. I missed something?

Max Semenik 3:25

I don't have a php.net account so I can't vote for myself, obviously.

Derick Rethans 3:31

I actually think you can because you have written an RFC.

Max Semenik 3:35

I haven't seen any interface to vote.

Derick Rethans 3:38

Interesting. It's actually something to catch up on because I pretty much sure that you can. Should investigate that for some other RFCs that are still open because I think you should be able to.

Max Semenik 3:49

Would benice. I mean, this wouldn't change anything but..

Derick Rethans 3:54

That's true but I mean you've started contributing. If you be able to vote right that's the fair thing to do, I suppose. So as you said, this is your first contribution to PHP itself. How did you find the whole process of getting this going and getting started with it?

Max Semenik 4:10

As far running an RFC, it was fairly straightforward to me. Maybe because I was looking at PHP RFCs in the past, so I knew how the process worked and it was really something that I already knew how to navigate. It's not the first open source community I'm contributing to, so I kind of know what to do in general.

Derick Rethans 4:40

How large is the MediaWiki community?

Max Semenik 4:43

It's probably larger than PHP community in terms of actively contributing people, as in which the Wikimedia Foundation has lots of paid programmers that work on the ecosystem. Obviously the outreach of your community is larger than MediaWiki's.

Derick Rethans 5:08

You're saying that there's more people working on, on it. But there's more people using PHP?

Max Semenik 5:15

And more people actively interested in development.

Derick Rethans 5:21

Do you think that's because it's easier to contribute to something that's written in PHP, than PHP itself?

Max Semenik 5:28

Not a lot of people know how to program in C these days. And while I used to be paid for writing C, my C's currently extremely rusty. Unlike PHP, for example.

Derick Rethans 5:44

For me it's sort of the other way around, because I haven't been writing PHP code for quite some time now, except for some test cases, so I know nothing about frameworks whatsoever. I know C pretty well. In any case, we now have one more active contributor, that is you, that is you. You've things merged that makes you a contributor, in my eyes. As this is a pretty small RFC. And I think during the course of the last few months we have I've discussed with several other contributors that small RFCs are a good thing, because it makes it much harder to find problems with. There are a few other RFCs as well that are also so small and for which the authors declined to talk to me about that for various different reasons. And two of those are actually really really simple things, and they are both having to do with the bundling of extensions in PHP. Now, just thinking about this question. How does MediaWiki, for example, think about which extensions, it can use in its source code?

Max Semenik 6:45

For MediaWiki. First of all, on start-up MediaWiki quickly checks if all the hard required extensions are available, and they just bails out if they aren't available. I need to look, whether it checks for JSON or as soon as it's way too obvious to even consider whether it's present or not.

Derick Rethans 7:10

So you just mentioned the JSON extension. That makes sense because that's one of my notes. One of the RFCs as you just alluded to is to JSON extension, and PHP eight will have this always available now without you having to enable this in configure flags, which is pretty good way of making sure that extension is always available to everybody using PHP. Do you agree that having a JSON extension always available is a good idea for PHP?

Max Semenik 7:37

Yes absolutely. One of the aspects of writing software that's available for everyone to use, as opposed to some internal company software that's running on a few servers and that it, is that the you need to support a wide variety of systems. And if it's possible to compile PHP without JSON, it means that someone will compile without it. It also means that some Linux distribution developers will package it as a separate package, and then someone will not install it, and you will get people to complain that MediaWiki doesn't work on their system. For more, very popular extensions are available. If I will know that many popular extensions that I need, are always available, it makes my job easier and it also allows me to write better software, without having to resort to hacks and decrease the functionality.

Derick Rethans 8:52

An what some other framework to do this they start making polyfills for them.

Max Semenik 8:56

And these polyfills might have vital like orders of magnitude worse performance. If I can have guarantees that a system has JSON, as well as other extensions like mbstring, intl, and so on, it would be really awesome.

Derick Rethans 9:16

The argument always between, do we always want to have everything inside PHP or not, and at some point you need to start making a distinction about is this useful enough for everybody or just for a smaller group of people, and mbstring is probably an example where this is sort of, sort of on the line right. I mean it's useful enough, but is it useful enough to have it always enabled instead of having it easily installed as a package.

Max Semenik 9:42

Well you know lots of people are running software, whether it's MediaWiki whether it's some WordPress or something else on crappy shared hosting, which is the bane of every programmer's existence but they still have to support it. The question is really something can be messed up. Some people will have to run a node on systems that have messed up. And if we can avoid it. Why not?

Derick Rethans 10:11

Another RFC that's just gone through its unbundling extension. Some versions of PHP will have extensions, being brought into core and being always made available like we did with the hash extension in PHP seven four. But of course we also removing extensions from PHP to live somewhere else. Not even having them always enabled but not even having them distributed with a PHP source code. In PHP seven four we had for example the Firebase extension, I believe, because there wasn't a lot of people using this. In this case we having the XMLRPC extension. Have you ever heard of this XMLRPC extension, because you said you've been programming PHP for a while?

Max Semenik 10:51

I've heard about the protocol itself and I might have heard about PHP having this extension, but I've never used it, and honestly I don't know why anyone using it.

Derick Rethans 11:04

It's sort of being used a little bit when people really didn't want to use SOAP, because it was too complicated. But before we had invented JSON pretty much. That's a long long time ago.

Max Semenik 11:18

These days. XMLRPC is sounds like a legacy corporate system. That's why probably, it's no use having it in PHP proper.

Derick Rethans 11:32

I think I very much agree there. In any case, non capturing caches are in PHP eight. You said that the RFC was saccepted, has the patch being merged as well.

Max Semenik 11:41

Yep.

Derick Rethans 11:42

Great. I'm going to have to have a flavour that I'm going to give a talk next month for the Dutch PHP conference, where I'm talking about a new additions in seven four, but also what's coming up in eight dot zero, I might be able to have a slide about it in there.

Max Semenik 11:57

Awesome.

Derick Rethans 11:58

Thank you, Max for taking the time today to talk to me about non caption captures and bundling of extensions.

Max Semenik 12:05

Thank you, Derick for giving me this tribune. It was a nice talk.

Derick Rethans 12:09

Excellent. Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week.


PHP Internals News: Episode 57: Conditional Codeflow Statements

PHP Internals News: Episode 57: Conditional Codeflow Statements

In this episode of "PHP Internals News" I chat with Ralph Schindler (Twitter, GitHub, Blog) about the Conditional Return, Break, and Continue Statements RFC that he's proposed.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:17 Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 57. Today I'm talking with Raluphl Schindler about an RFC that he's proposing titled "Conditional return break and continue statements". Hi Ralph, would you please introduce yourself.

Ralph Schindler 0:37 Hey, thanks for having me Derick. I am Ralph Schindler, just to give you a guess the 50,000 foot view of who I am. I've been doing PHP for 22 years now. Ever since the PHP three days, I worked in a number of companies in the industry. Before I broke out into the sort of knowing other PHP developers I was a solo practitioner. After that I went worked for three Comm. And that was kind of a big corporation after that I moved to Zend. I worked in the framework team at Zend and then after that, I worked for another company based out of Austin for friend of mine Josh Butts. That offers.com, we've been purchased since then by Ziff media. I'm still kind of in the corporate world. Ziff media owns some things you might have heard of, PC Magazine, Mashable, offers.com. The company that owns us owns is called j two they are j facts. They keep buying companies, so it's interesting I get to see a lot of different products and companies they get bought and they kind of get folded into the umbrella, and it's, it's an interesting place to work. I really enjoy it.

Derick Rethans 1:39 Very different from my non enterprise gigs

Ralph Schindler 1:43 Enterprise is such an abstract word, and, you know, it's kind of everybody's got different experiences with it.

Derick Rethans 1:49 Let's dive straight into this RFC that you're proposing. What is the problem that this RFC is trying to solve?

Ralph Schindler 1:54 This is actually kind of the bulk of what I want to talk about, because the actual implementation of it all is is extremely small. As it turns out it's kind of a heated and divided topic, My Twitter blew up last weekend after I tweeted it out, and some other people retweeted it so it's probably interesting. I really had to sit down and think about this one question you've got is what is it trying to solve. First and foremost, it's something I've wanted for a really long time, a couple years.

Two weekends ago I sat down and it was a Saturday and I'm like, you know what I haven't haven't hacked on the PHP source in such a long time. The last thing I did was the colon colon class thing, and I was like seven or eight years ago. And again, I got into that because I really wanted the challenge of like digging into the lexer and all that stuff and, incidentally, you know, I load PHP source in Xcode, and my workflow is: I like to set breakpoints in things, and I like to run something, and I look in the memory and I see what's going on and that's how I learned about things. And so I wanted to do that again. And this seemed like a small enough project where I could say, you know this is something I want to see in language, let me see if I can hack it out. First and foremost, I want this. And, you know, that's, it's a simple thing.

So what is it exactly is, it's basically at the statement level of PHP, it is a what they like to call a compound syntactic unit. Something that changes the statement in a way that I think probably facilitates more meaning and intent, and sometimes, not always, it'll do that and fewer lines of code. To kind of expand on that, this is a bit of a joke but a couple years ago there was that whole argument online about visual debt. I don't know if you remember hearing that, that terminology.

Derick Rethans 3:34 Yep.

Ralph Schindler 4:47 Foo.

Derick Rethans 4:23 Up to now we haven't spoken about but the RFC is proposing so maybe we should talk about it first and then get back to other things that he said have you spoken a little bit about the reasons why you want to change something. But what would you like to add to PHP or, or what would you like to modify in PHP?

It's, you know, it's, it's very closely related to what in computer science is called a guard clause, and I used that phrase lightly when I originally brought it up on the mailing list but it's very closely aligned to that, it's not necessarily exactly that, in terms of the syntax. In terms of like when you speak about it in the PHP code sense, it really is sort of a change in the statement; so putting the return before the if. That's really what it is. So guard clause, it's important to know what that is, is it's a way to interrupt the flow of control, you know, over the history of programming languages.

Ralph Schindler 7:19 Let's just go back to Pascal. Pascal like 50 years ago, there was no opportunity in Pascal code to exit early from either a loop, or a method, so you had to wait until you got to the very final sort of statement, and there was a single exit from a function. Guard clauses allow you to effectively, if you're inside of a block of code, or a loop, or some kind of flow of control. It gives you an opportunity to say I want to exit here instead of continuing on. They did a whole bunch of studies on Pascal and they found out that students were like, they couldn't come up with the right solution when let's say if you had a loop statement, it had to execute 100 times there was no opportunity to get out early. When you gave them the opportunity to interrupt the flow control the correctness of their solutions, ultimately got better. Almost 100% of the time they were able to, you know what this is an exceptional piece of code, I want to exit here.

Fast forward guard clauses, they're kind of, if you've kind of followed the Kent Becks and the Martin Fowlers they would argue for guard clauses. Y'know over the line that's gotten more popular as an argument over the past, let's just say 15 years in our industry

Derick Rethans 8:23 Would another term for this be like an early return?

Early returns are one of them, early breaks, and early continues, so getting to a place in code where you just say you know what this, there's a particular condition, in this normal flow of execution, I want to stop that normal flow and I want to break out of it. Goto is another tool that allows you to do this. I don't know if you can do it inside of loops, maybe you can. There's like some exceptions in PHP where you can jump to and from,

You can jump out of loop, but you can't jump into one.

To some degree, these tools do sort of exist, goto, another heated topic in the PHP world. So getting back to what the guard clause is. More specifically, it's, it is very closely, and semantically aligned with a Boolean expression. You will generally say, I want to either return, break, or continue, based off of this Boolean. PHP itself does not have first class support for guards. The way we achieve it currently is, we will put the Boolean expression first, and as part of a block of code associated with that, so: if curly brace block of code, that might terminate in a early return. Inside of switch statements or loops, you'll see that if something something something continue one continue two, or break one break two. Return expression, break continue, along with a return or break expression, is the way we achieve it in PHP. This is kind of giving first class support to a guard clause. It would spell it out in the manual and it would be a tool that since it has a name, and it isn't the language, programmers could reach out and say, I know what that is, or: Here's what it is in the manual, how do I use that? That's kind of, you know what a guard clause is.

At the moment, if you mentioned the guard clause you can sort of implement by doing: if, your condition and then a curly braces return, or break, or continue, whatever you set. What is the syntax that you want to replace this with?

I don't want to replace syntax. PHP is a flexible language. We have multiple ways of doing lots of things. We have multiple ways of crafting closures and anonymous functions. We have two different ways that have existed since the beginning of PHP's time for doing if statements, one can be broken up by the, the semicolon, with the block the endif, or you can do with curly braces. You've noticed that with various PSRs and whatnot that people have gravitated towards a particular coding standard. And that, for all intents and purposes for the global community of programmers to have the shared diction, that's a good thing.

Ralph Schindler 10:50 With regards to PHP. So the most important characteristic of this RFC is that it is now, PHP is a left to right language, you know like much of the 90-95% of the speaking world left to right. They tend to put the emphasis, especially encoding of precedence on the left side. So this moves the return keyword to the left side of a statement or syntactic unit. So when you look at this code. The first thing you see is: return. In the variation one, which is the one I proposed of this, this feature, "return" is followed by "if", what you notice is that when you look at code you'll see "return if", and almost looks like its own key word. Those two individual, you know tokens, those key words must align themselves closely together exactly. You know, maybe there's like two spaces between them but return if are right next to each other, they can be treated almost as a new keyword and of itself. So as you're reading code top down, left aligned, you'll see return if, return if, finally at the bottom method, you'll see return. So that's variation one and what it does is it creates sort of this precedence that the keywords you know the static constant keywords return an effort first. Your expression is third. Your optional return value is fourth. In most of the cases where you're writing this, it does become a one liner. That's not to say we can't do one liners today, because you can do: if, if-expression, something, return. But what happens when you look at that code is that the return value is off to the right. Optionally if you don't, if you want to break outside of the PSR coding standards, or with the PSR coding standards. You can do curly braces and then put the return on the next line, now you got three lines of code, you've returned is indented. As you're visually approaching this code. See, you know what's most important to you is that there's a if statement there, but then you have to kind of scan the body of that to see if there's an early return. The fact that it's an early return in variation one becomes abundantly clear at the leftmost rail of the code, at the leftmost side of the statement, assuming you're not putting all of your code on one line.

Derick Rethans 12:59 You talk about variation one, I guess there's a variation two as well. What is the difference between them?

Ralph Schindler 13:05 As with RFCs, people have preferences and they have. Just with politics in general, if you're in a political position, which this is a political changes to PHP, you have to know where your constellations are. You have to know, basically, if I want to appease the most amount of people like what will I have to give up in order to get something that is still beneficial to me. For me right now, it is the compromised position. That's not to say I won't like it more, maybe a month from now on, but effectively the variation two is moving the optional return value after the Return. Return, optional return value, then the if, i f, and then the optional, not the non optional if expression, followed by the semicolon. So basically it would read more like English, so to speak. Return this, if this. What I understand it is that way in Perl. I know it's that way in Ruby. So Ruby follows the same thing because the way they've implemented it is not necessarily in a single statement they've, they've implemented what they call a statement modifiers, which is any statement can be modified with this conditional at the end of it. That's the alternative syntax. If I were to use this, I get value out of it because maybe I don't return an optional expression and then I'm still left with return if this. I still have my escape hatch for methods that have an optional return, the ability to return void.

Derick Rethans 14:26 In variation one, how do you separate out the condition with the optional return value?

Ralph Schindler 14:32 Another reason why I thought variation one was good for PHP specifically. Let's just do like two seconds of history. If you go back 20 years, C++, the way you write a method signature in C++ is: you'll do public, int, method name, typed arguments, so the return, we call them, hints, the hint for the method in C++ precedes the method.

Derick Rethans 14:55 I've just been talking to Dan Ackroyd for the podcast episode that came out last week, where he is saying that we should stop calling it hints, because they're no longer hints, they're not proper type names. Maybe we should pick that up here as well than?

Ralph Schindler 15:10 We've had that discussion for 10 years now. But people know them as hints. We've such loaded phrasing and PHP like type coercion. Whatever we call them, I'll just continue with hints for the time being, because that's the audience at this particular podcast knows them as hints. The hint in C++ would have been all the way to the left of the line, whereas in PHP when we chose to implement typing of the return values, we did it in a way where it was the method signature had the semi colon and the return type at the end of the method signature. This particular variation one, this follows that same pattern, where your semi colon return value looks exactly how the layout of the method signature is where it's semi colon, what you see up top. There's a big parallel there between an early return with an optional return value. Also, I like optional things to be at the end. And when you look at this whole statement that's the optional part, whereas when variation two the optional part being in the middle means return optional part if, or return if are both valid things. So parallel is the method signature. That was kind of why I personally like the first one. They're both my children at this point I love them both.

Derick Rethans 16:20 As you said, introducing syntax is always a bit tricky and it's a political choice. What has been sort of the feedback and, and or the criticisms, to your suggested that additional language constructs?

Ralph Schindler 16:33 Smallest changes always get the most feedback, because there's such a wide audience for a change like this, like they can immediately see the benefits or negative value of it in their own code, all the way from the junior programmer, all the way up to the senior programmer, I can't quantify who's Junior new senior, I can't also quantify who has been programming a long time and it was, for lack of a better term set in their ways and likes their style versus those who have adopted a certain flexibility in the way that they develop and like the size of the team they're on and how much of a leniency they put on someone else to write code that they will just you know code review and accept. So the interesting thing is that you have to kind of understand Junior programmers, or senior programmers. When the junior programmer gets in there, and they start programming, they tend to write code that is very brute force, they just write a lot of code because in order to get better at writing code you just keep writing code. To them, their perspective is from the code writing standpoint, they're not looking at this from a code reading standpoint, they're looking at it from a writing standpoint. So when you see a junior programmer they rely on ifs and loops and like the rudimentary techniques, less abstraction, fewer methods, more lines of code. They tend to not break things out into well equipped to well named methods. Whereas as they grow as programmers they start reading other people's code more and then they do start appreciating abstraction like this 50 line thing needs to be a five line thing. It needs to have its own name as a method over here, I need to reduce the number of inputs, have a very specific outputs, so on and so forth. So it's more highly structured code. Putting a feature out, you know like this, you get a range of perspectives from people. It goes without saying. I mean, Taylor retweeted it, I know he has a preference for this style of programming. I know exactly where it came from. He appreciates certain things in like the Ruby world, the return if statements in Ruby is a clear, concise, and very impactful statement, and too much of a degree he's, he's implemented that same thing in Laravel. So if you look at the helper methods in Laravel someone that writes Laravel applications is used to using something like abort if, or throw if. Interesting side note here, PHP is going to have a feature where you can put a throw expression, following a ternary. That in and of itself, allows exceptions to have a much more concise syntax. It allows you to use PHP exceptions for flow control. So you still can't do that with a return value for example, you can't have it a ternary with a return value. And I guess that is another way of being able to do achieve the same thing. This idiom, of being able to going back to guard clauses, and going back to thinking about early exits of methods, this was prevalent in Laravel where you could say in a controller method, and this is specific to an HTTP context, because you're inside of a controller, abort if, abort is highly specific to HTTP, where are you going to return a 404 or 500, it's going to throw an exception, an HTTP exception, which the framework knows to convert these kinds of exceptions into error paths in an application. So again we're still talking about application code, not necessarily library code. So abort if and abort unless is an idiom that I've seen is a fantastic idiom for controllers. I mean you can when you're thinking about a request which PHP is highly request driven, you can see when I start this method with the request object, you know, these are all my early outs, you know, this is where I'm going to return, and then at the very final spot I might be returning a view, which is a successful page for this MVC application. I feel like it was a successful idiom there and that was also part of the reason that drove me say, you know, it would be neat. If I could just say, return response, if this condition and have that early out.

Derick Rethans 20:12 What's been the biggest criticism so far?

Ralph Schindler 20:15 Biggest criticism is we can already do this. See, I hear that all the time, with all sorts of other features to varying levels varying degrees. I can do this with if something return something early. I said earlier that the proposed syntax might not be shorter and that's true. It is just changing the order of the operators, or the order of the keywords but, you know, that's an important distinction, like I want the precedence of the return to be earlier in the line. I think that's the important distinction. And I feel like maybe people that are saying it doesn't reduce the amount of code need to take that into account. And it's hard to see it really take that into account, unless you see variations of this sort of mental model of code. That's on me. I've been taking all the sort of like criticism, I'm kind of in a cooldown phase right now. I've been looking, I've been watching Twitter, I've been watching the Reddit. It's generally cooled down on internals mailing list, and I'm just kind of thinking about it because going back to likening this to a political sort of thing is that I have to rephrase my argument so that people that have a very firm stance on: I don't like this because I don't like it, or I don't like this because it doesn't shorten my code. I have to find an argument that gets them to start thinking about why this might be a good thing. I understand like this might get shot down in PHP. Right now, if I was a betting man, we were in Vegas, and someone asked me: Do you think this is going to go through, I probably would have to bet against myself I think 40-60. The temperature that I've taken on internals and everywhere else seems to indicate that it wouldn't be successful, but I'm collecting my evidence right now and putting out a blog post that kind of explains why it is, what it is, and putting a better argument forward. If that can't push it over the threshold, you know, I'll accept the defeat, so to speak, look at the history of PHP: annotations, and whatever they were called attributes, eight years ago were shot down. And, interestingly, I use the annotations back in the day with doctrine, I'd no longer use doctrine. So I voted to accept them. I might have voted to not accept them eight years ago, and I voted to accept them now, even though I don't use a variation of that any more.

Derick Rethans 22:15 There's a few things that keep changing over time, right, first of all people turn from junior programmers into senior programmers, so they think about how to structure code more and more. And at the same time they also start seeing the value of some things that PHP never had right and. A good example is the scalar typing, that's been spoken about for maybe 15 years even, and it took so many different approaches, and as you say attributes, although attribute is a little bit different because this RFC is absolutely not the same as the earlier ones where the implementation is quite different from the version one then end up solving lots of problems that people found with the original RFC.

Ralph Schindler 22:53 I have not been part of sort of the global PHP community. I started in the mid, 2000s. And having worked with PHP since 1998. I remember the early days where PHP was not fast at all. It was as fast as other things, but I gravitated towards it because I liked the syntax. Back in that day, I would have had more of an emphasis on things that would run faster, regardless of how they look because, I had projects for example in college I wrote a program where kids would go up and like on Valentine's Day, put all their preferences in. That was a week leading into Valentine's Day, and then on Valentine's Day they could come back to the University Center, and get a printout of all the other people that have fill out the questionnaire, and matched. When you have 1000 people fill out a questionnaire, this was PHP in 2000, 99 on 2000. And when I tell you, it took hours for the script to run and calculate all of the matches for a person, changing just the way an if statement would run, or changing the way you early exited an if statement when you know that you had to filter out a person. It changed the output by hours. The code was very, very closely aligned to like the performance, whereas now, PHP eight: I don't think that we have so many more affordances. You don't have to think about: Should I interpolate strings inside of a single quote or double quote, like none of that matters any more. We've solved all those problems. You can call sprint off just as quickly as you can do an echo, echo out and no one really cares, it's gonna perform the same. Wasn't the case 20 years ago, it is the case now, so now we have this affordance where we can look at the, you know, for lack of a better term, you know, is the code pretty, like is it easy to read.

Derick Rethans 24:32 Thank you all for taking the time this afternoon, or in your case morning, I think, to talk to me about your RFC. I'm looking forward to seeing this coming to vote at some point.

Ralph Schindler 24:43 I appreciate you having me on the, on your podcast. Thank you.

Derick Rethans 24:47 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week.


PHP Internals News: Episode 56: Mixed Type v2

PHP Internals News: Episode 56: Mixed Type v2

In this episode of "PHP Internals News" I chat with Dan Ackroyd (Twitter, GitHub) about the Mixed Type v2 RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:20

Weekly a podcast dedicated to demystifying the development of the PHP language. This is Episode 56. Today I'm talking with Dan Ackroyd about an RFC that he's made together with Mate Kocsic it's called the mixed type version two. Hello, Dan, would you please introduce yourself?

Dan Ackroyd 0:38

Hi Derick. So my name is Dan Ackroyd, also known as Dan Ack online. I maintain the PHP image extension. And I also contribute to PHP internals illegitimate by maintaining some documents that called the RFC codecs that are a set of notes of why certain ideas haven't reached fruition in PHP core, and occasionally I help other people write RFCs.

Derick Rethans 1:04

Continuing with the improvement of PHP type system in the last few releases. And we've seen a few more things coming into PHP eight but union types. For a long time, there has been an issue with PHP's internal functions that the type that a return cannot necessarily be represented in PHP type system because they do strange things. It is RFC building more on top of PHP's type system. What is this is trying to solve?

Dan Ackroyd 1:29

There's a couple of different problems that's trying to solve. The one I care more about is userland code, I don't actually contribute that much to internals code so I'm not that familiar with all the problems that has. The reason I got involved with doing the mixed RFC was: I had a library for validating parameters, and due to how that library needs to work the code passes user data around a lot internally, and then back out to whether libraries return the validators result. So I was upgrading that library to PHP 7.4, and that version introduced property types, which are very useful things. What I was finding was that I was going through the code, trying to add types everywhere occurred. And there's a significant number of places where I just couldn't add a type, because my code was holding user data that could be any other type. The mixed type had been discussed before, an idea that people kind of had been kicking around but it just never been really worked on. That was the motivation for me, I was having this problem where I couldn't upgrade my library, as I wanted to, I kept forgetting has this bit of code here, been upgraded. And I just can't add a type, or is it the case that I haven't touched this bit of code yet. So coincidentally, I saw that Mate was also looking at picking up the RFC, and he had copied the version that Michael Moravec had been working on previously. I want as I mentioned earlier, I help people write RFCs is for a lot of people where English isn't their first language, it's a difficult thing to do writing technical documents in English. I also think that writing RCFs in general is slightly harder than people really anticipate. Each RFC needs to present clearly why something's a problem, why the proposed solution would work, snd, at least to some extent why other solutions wouldn't work. Looking at the text from the previous version I could see the tool though, I understood, all of the parts of that RFC, I don't think that it made the case for why mixed was the right thing to do in a very clear way. So I spent some time working with Mate to redraft the RFC, discussing it between ourselves and going through a few of the smaller issues before presenting it to internals, for it to be officially discussed as an RFC.

Derick Rethans 3:51

Where does the name mixed actually come from?

Dan Ackroyd 3:54

So, mixed is actually a very old concept in PHP it's been used in the docs for multiple decades. I think we have multiple core contributors who are younger than the mixed type, which is an interesting situation for a language to be in. It had been used in the documents, all over the place. It has been used to show that the type of a parameter, or return type from functions was quite complicated. It's actually slightly different from how people might use it in userland code. A lot of the places where it's used in the docs would now use a union type there instead of the mixed type. But there are still places where mixed is the correct type to use in the documents.

Derick Rethans 4:40

This being an RFC, you're proposing something to do in it. What are you proposing to introduce into PHP?

Dan Ackroyd 4:46

To be precise, the RFC proposes being able to use the word mixed as a type to be used for parameter types, return types, and property types and mixed is really a shortcut for something that can be done in Union types, mix is the equivalent of writing array or blue, or callable or int, or float, or no object or resource or string. One of the benefits of mixed is that it's much shorter to type but the full equivalent to that.

Derick Rethans 5:18

And you'd have to do is every time you use it.

Dan Ackroyd 5:20

It's particularly hilarious when you've got a function that accepts any type of parameter, and then returns that parameter, that's been modified. So you have mixed on the way in, and mixed on the way out, having all of those words on the same line of code is just too much.

Derick Rethans 5:35

Does the mean that makes is pretty much implemented as a union type?

Dan Ackroyd 5:39

I have no idea. I'd have to refer you to the actual implementation which I can't recall the details off, off the top of my head. The actual internal type checking in PHP is not as clean as you might imagine, from userland, particularly around things like callable, that's not, it's not a straightforward path of code for tracking, whether something's callable. It works as union type, but how it is actually implemented internally, is probably more detailed than that.

Derick Rethans 6:07

I'll have a good book, a little bit later than. As, you set a sort of acts as a union. But Union types, and variance are quite tricky. And then I spoke with Nikita about union types, it wasn't the clearest explanation because it's a really difficult concept, right. So how does the mixed type interact with variance in either arguments or return types properties?

Dan Ackroyd 6:30

I agree completely. Variance's complicated thing, and liskov substitution principle is a reasonably complicated thing. Full disclaimer here, I am not a computer scientist, I didn't study computer scientists in University. I studied chemistry and molecular physics, and the only formal education I've had in programming, was a single 10 hour course that taught us how to use Fortran 77, which is a lovely language for the 70s, not quite so good for the 1990s when I was learning it. I think people concentrate too much on the theory behind computer science. If I read out the general rule of LSP or liskov substitution principle. It says: For each object O1 of type S, there is an object of type T, such that for all programs P defined in terms of T, the behavior of P is unchanged. When O1 is substituted for O2 and S is a subtype of T. I don't fully understand that. I mean I can go through it and understand it in principle, but I don't understand it. I don't grok it at a fundamental level when I'm writing code, for me a better way of thinking about LSP is to simply say that: if your code follows LSP, then it's probably not going to blow up. If you violate LSP, your code has a very good chance of blowing up. For both parameter types and return types, the way that PHP implements the type checking through variantce, the type checking is done to make it conform with LSP, but the simplest way of putting it is: make sure that your codes not going to blow up on bad assumptions about the types that being passed around.

Derick Rethans 8:17

Because PHP does it adhere to LSP your lovely new mixed type does have to adhere to it. How does your lovely new mixed type tie in with LSP and variance specifically because mixed is a little bit special. In some cases, because at the moment PHP if you have a method. And you return nothing from it, sort of acts like mixed. So I saw that in the RFC there is a specific handling of having no arguments going to mixed and then back to no type.

Dan Ackroyd 8:48

The RFC; one of the details, is when no type is present for a functional term the signature checks for inheritance are done as if the parameter had a mixed, or void type, so that's a union type of mixed and void. That's the correct thing to do. It makes the code work as you'd expect it to do, and avoids any possible scenarios where you'd make an assumption about the method in the parent class, and that assumption not being true in the child class. I think this is one of the areas where PHP's special behaviour, shines through. This might not be an acceptable solution to people who work in languages that have a cleaner type system, but they probably stay well clear of PHP to begin with, but the details of how it works means that the code behaves as you'd expect it to and doesn't blow up.

Derick Rethans 9:42

Well, that's the reason why void isn't part of the mixed union?

Dan Ackroyd 9:47

Mixed and void are related, but quite different from each other. Mixed is a guarantee that for return types. It's a guarantee that a parameter will be returned, but you can't, we can't give you any more details of what the type of that parameter will be. Void, is a guarantee, in quotes, that no value will be returned. I actually strongly regret void being present in PHP. I think it was a mistake. One of the very nice things about PHP is the way that every function returns null, even if you don't have a return statement in that function. This is something that's quite different to a lot of other languages where it's common to have functions declared as void return type, so there's no return value at all. Because PHP always return null, it allows you to do things like var dump, then put a function inside var dump bracket, and that's always guaranteed to not blow up.

I would have strongly preferred us to introduce the null type to PHP, and for people to use that, when they're not returning a more semantically meaningful value from their function. I think that would actually be a lot better into the PHP type system, and make it a lot easier to write code, that's chainable.

Derick Rethans 11:10

The only real locations where it can't return any values is a constructor and a destructor in PHP.

Dan Ackroyd 11:16

It would still have a use for functions that never return. So like continual loops, and also functions that only ever exit for by throwing an exception. I think TypeScript has this, I think they call it none. I can't remember the details but it has its uses but the way that most people are using it in PHP is wrong, in my opinion. The reason I still get a little bit worked up about this is because people are still suggesting that we should change the behaviour of the language to match the void return type. I.e. make it so that if you try and use the return value from a function that has a return type of void that PHP should blow up. I just strongly disagree with that, I think, returning null so that functions can be chained together. Even if there's no semantically useful information there is preferable to having code blow up through trying to read the result of a function.

Derick Rethans 12:12

Because it's a bit different than in statically typed or compiled languages where you can do all these checks in the compiler right? And never had runtime, whereas in PHP these checks always have to happen at runtime.

Dan Ackroyd 12:23

They do but I think it's at a different level than that it's just does, being able to define the fact that we're reading from a particular function should make the program blow up. Is that a useful thing to do or not? This is quite similar to another discussion that pops up every now and again, of whether to make PHP blow up if too many parameters are passed to functions. There's people who strongly feel that this is a terrible thing to allow, that we need to punish anybody who has extra parameters, being passed around. I actually find having extra parameters be a useful debugging technique very occasionally. Imagine scenarios, in scenarios where you've got an interface that comes from a library that's implemented in 10 different classes in your code, but you want to debug one particular implementation. Just being able to temporarily add on some extra parameters to a method call, and have that just work allows you to do some debugging techniques that just wouldn't be possible if PHP blew up when extra parameters get passed.

This is similar, really similar to the void discussion where people have very strong feelings about, we need to punish people who are writing code wrongly, we need to stop that code from working. The other way that yeah it's not great code, and maybe they might want to refactor their code to not do that, but I can't see any benefit in making PHP blow up.

Derick Rethans 13:49

In my opinion, this is I think that belong in project's coding standards, and their static analysers that they run over the code to make sure that they do all our stylistic choices correct, and not having too many arguments to methods is exactly belongs in that category. Right.

Dan Ackroyd 14:05

I agree completely.

Derick Rethans 14:06

There's a few more things that I'd like to poke your mind about. The mixed type does not include null, is there a reason for that?

Dan Ackroyd 14:14

We discussed this a reasonable amount when drafting the RFC, there's reasons to allow nullability, but what we couldn't see was a clear strong need of why nullability would be required. The mixed type includes null as one of the types and the union of the types of represents. So, adding nullability doesn't actually add any more, more information to the mixed type, because by definition, it's already can be null. It's always possible to add more to PHP core but removing features is really difficult. So we decided to leave it out, for now, just because we can't think of a really strong reason to add it. If someone finds a really clear compelling argument to allow mixed to be nullable, I would definitely be in support of that so long as there was a reasonable reason to have it. What I probably prefer before that, though, is it's kind of odd that the null type isn't usable as a type in PHP by itself. I think that's unfortunate because for union types, imagine you've got some code that can, it's going to return either a float or int, and then you find a reason why it might need to return null. Changing the definition from float or int, to float or int or null, is easier to read for me than question mark, float or int. So I think that might be another RFC that pops up on the radar in the not terribly distant future.

Derick Rethans 15:38

Time is running out for PHP eight little bit of course. So resource is part of mixed, but resource as a type you can't use as a type hint anywhere in PHP. So what's going on here?

Dan Ackroyd 15:51

Resource is more of a pseudo type, then a real type in PHP. It comes from code that was written before PHP even had classes is my understanding. Though obviously that's from the dawn of time so it's hard to figure out where. When people started writing PHP, they used resource, as we use classes now to represent a complicated bit of state that needs to be passed around from one piece of code to another. The problem with resource as a type, is that it doesn't really tell you that much about the type. If something is a resource, it could be a file handle, a curl handle, a GD image, an XML parser, or any of the other things that are called resource types. It's an ongoing piece of work to slowly refactor resource types away and replace them with classes wherever possible. An example of that is the hash context, used to be a resource type in PHP and I think since PHP 7.2 that's been changed to a class. Work's ongoing, and eventually hopefully most of the other resources will go away, and made into more specific types, but in the meantime resource still exists in PHP. The reason that's included in the mixed definition is because it's a reasonable thing to do to pass a file handle around. And so if you've got a parameter type of mixed. It's absolutely fine to pass in a file handle to that piece of code. Excluding the resource type would make the mixed type be too annoying to deal with because your, your code would then deal with all the other types, except resource.

Derick Rethans 17:21

That make sense. As I mentioned in the introduction mixed is already something that's used in a PHP documentation for a long time, and the RFC talks about stubs in PHP. This is something that is going to be introduced with PHP eight as well, what are these stubs.

Dan Ackroyd 17:38

I haven't contributed to any of this work so I apologize to anybody who has been doing this piece of work if I get any of the details wrong. One of the problems with PHP core was that for a long time, the information that was used to generate the reflection information was done on a very ad hoc basis. Some of the information was incorrect, and keeping the reflection information up to date with the actual definitions of how the functions work was annoying, to say the least. It's been an effort by a number of the core contributors to set up a system of file stubs, that allow people to write PHP code that defines a stub for each of the internal functions. So that's just like literally a PHP file that has a stub version of the function that just defines the parameter types, parameter names, and the return types. My understanding is that that information is then used internally by the PHP eight build process to generate the reflection information extract the parameters where appropriate, and could be used for features like named parameters where the name of a parameter in those stubs, the name would be coming from the stub file, rather than some random C file in the middle of the PHP core code.

Derick Rethans 18:53

And the stubs at the moment can't represent mixed. There's still a hold on, with comments.

Dan Ackroyd 18:58

That's correct. This is similar to what I was finding with my own libraries that there were just some things that you just can't currently, add type information for. And it was quite frustrating having to, oh no somebody hasn't missed this one it's just not expressible. Another reason for having mixed is that although generics are going to be still quite a long way off from arriving in PHP. If you wanted to express just a generic array that can contain any possible value. That's another case where the mixed keyword would be used.

Derick Rethans 19:29

I've saw some people ask why mixed was chosen here and not any. Is there any specific reason for that?

Dan Ackroyd 19:36

The very short reason is that it was easier. Mixed has had a mixed concept for multiple decades, mixed is used widely in PHP core code and documentation. It's also used widely in a community for tools like PHP Stan and Psalm where people use mixed in docblocks, or Psalm annotations to indicate any type. It's really widely established. We did discuss, using any instead. It just didn't seem worth the effort of trying to push it through, at least in part because there's so much legacy going on. Also it's just not clearly that much superior to mixed.

Derick Rethans 20:16

Very well. Are there any BC concerns by introducing the mixed keyword.

Dan Ackroyd 20:20

That's a small BC break, you can't use mixed as a class name or function name probably any more, but it's a pretty small one, and anybody using an IDE can just add as using a function called mixed in their code can right click on the function, rename, maybe go and get a cup of coffee if that IDE is slow. There is also tools in the PHP community. This is actually quite a surprising thing that PHP has one of the best refactoring tools out there in Rector. That's a tool that, because it understands the abstract syntax tree of PHP, it can understand that: Oh hey there's this new BC break in the next version of PHP. In this case, if you have some code that had a class name mixed it would understand this is going to break. They provide sets of tools for allowing you to upgrade your code automatically. It's a really awesome tool. It's slightly surprising to me that it's probably like one of the best code refactoring tools, if not the best, in any software language. I've looked at some other language's ecosystems, and I think one of the things about PHP is that because it's actually quite a diverse ecosystem, and people sometimes migrate from Symfony to Laravel, or want to upgrade a PHP 5.6 codebase to PHP seven, or those types of things to value in a refactoring tool is a lot higher. Somebody has gone out and done the work to make that tool, and it's really pretty good.

Derick Rethans 21:46

Sounds like something I should investigate a little bit then, because I actually had never heard of it. Also make sure to either link in the show notes to it. When you're introducing yourself, you mentioned that you're the maintainer of the image magic extension and PHP that you can use to manipulate images. What's going on with this extension? Is there going to be an upcoming release at some point?

Dan Ackroyd 22:05

I want to apologize to everybody for being very lazy and not doing a release, even though there's a small segfault, that happens occasionally, and it's which we have a fix for. To be honest, I don't really use the extension at all myself. And so, maintaining it is more source of stress rather than enjoyment. I know there's many, many things that could be improved for the project including doing releases on a timely basis, and improving the security of how it works, but it's just really hard to justify spending time working on it when it's just a source of stress for me, but it doesn't really provide any benefit to me. As an effort to make it be worth my time effectively or at least give me a gold focus on, I'm going to start asking people to donate money to the projects, to sponsor it, just that I can actually justify myself getting stressed out from trying to help people with impossible to solve bugs that only happened on their system, because otherwise it's just a bit too much stress for me to really want to spend any much, much more time working on it.

Derick Rethans 23:08

Very well, do you have anything else to add?

Dan Ackroyd 23:10

Yes, I have a big request, and you've done this a couple of times during this interview. I'd very much appreciate it if everyone in the PHP community could refrain from using the word hints. When talking about types. It used to be that PHP type system was just hints where yeah the documentation says that this function takes an int, but that was just a hint, and it wasn't really enforced. The type system in PHP has evolved into an actual type system that is enforced at runtime, and although it's not a big deal. It does help when talking amongst ourselves as a community, but also when we're talking to people who don't do that much PHP, who are coming from other languages, where their type system is still just a set of hints. Using a slightly more precise language of the PHP type system and parameter types, return types, and property types. It avoids any confusion about what's actually happening in the engine. And if that is my windmill that I tilt at.

Derick Rethans 24:11

Alright, thank you, Dan for taking the time this afternoon to talk to me. And I will be looking forward to seeing mixed in PHP because it got accepted, just earlier, yesterday I think. And, yeah, part of PHP's improving type system again.

Dan Ackroyd 24:25

Thanks for having me on. It's been a pleasure.

Derick Rethans 24:28

Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool, you can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week.


PHP Internals News: Episode 55: Dealing with Bugs

PHP Internals News: Episode 55: Dealing with Bugs

In this episode of "PHP Internals News" I chat with Ignace Nyamagana Butera (Twitter, GitHub, Blog) about how the PHP project handles bugs and bug reports.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 55. Today I'm talking with Ignace Nyamagana Butera after he'd asked me on Twitter, how PHP deals with bugs. A few episodes ago, I did a Q&A session about the RFC process. And this time again, we'll have Ignace Nyamagana Butera asking the questions. Would you please introduce yourself?

Ignace Nyamagana Butera 0:46

Hello, everyone. Hello, Derick. My name is Ignace Nyamagana Butera, but you can call me Nyamsprod. I've been a PHP developer for around 15 years now. Currently, I'm working as a software developer, and technical lead in the internet content provider agency. When I have free time, I'm doing some open source, I have a couple of projects that you may have heard of, like, league CSV and league URI. I created them and I am currently maintaining them.

Derick Rethans 1:23

Yeah, as I said, it is not me asking the questions as you this time. So I think we should jump straight in actually.

Ignace Nyamagana Butera 1:30

So my first question will be somehow really simple, because we are talking about bugs. And I was wondering if we had some statistics about bugs in PHP.

Derick Rethans 1:44

Though there are some statistics. I mean, it's not really easy to get that information out of our bug system. But just having had a look, it's about on average, maybe one bug a day gets reported at the moment or is nearly 80,000 bugs in the bug system of course, not all of these are closed, some of them are open, but the majority of them are closed.

Ignace Nyamagana Butera 2:07

Do bugs from the EOL PHP still being taken into account or we just say: okay, these bugs for instance, are for PHP five, will no longer look at them.

Derick Rethans 2:18

If it's a bug, unless it's a security bug fix, we won't look at them for unsupported PHP versions. So at the moment, PHP, seven three, and seven four are still supported. So those bugs will of course look at, if it's a security bug, we only will go back to PHP seven two. If it's reported to any older version and seven two for example, seven one or seven zero, or even PHP four or five, which does happen occasionally, we'll tell them to upgrade first because we won't spend time doing that.

Ignace Nyamagana Butera 2:47

Because I manage and maintain open source project. I know that PHP as a language is used everywhere and you can have multiple reports. First thing first, what is a bug? Because there are multiple definition of it.

Derick Rethans 3:03

And I'm sure if you asked 12 people, you get 13 definitions. I think it is unexpected behavior of something that is documented. So if something is documented do this, and it does something else, or it does something really wrong like crash your program, then that will be a bug.

Ignace Nyamagana Butera 3:21

What is the source of truth? Is it the PHP documentation? Is it the PHP specification language, what is the source of truth? Nothing. Okay. This is expected behavior because it is documented, or how does it work?

Derick Rethans 3:38

For most of the syntax, it's what the source does. And of course, you always find edge case. And I don't have a good example right now. For anything that the syntax, I mean, documentation and behavior should absolutely always work the same. If it doesn't, it's likely going to be a bug in the documentation. If you for example, look at other functionality like in an extension, there is almost as likely that the documentation is sometimes wrong than it is that the code's behavior is wrong. In that case, we need to have a good look at what what the expected behavior should have been. Now, with all the new features that have been put in, since we have the RFC process, pretty much anything that the RFC describes how it should work, is how the feature should work. And if it doesn't, that pretty much means there's a bug. Having said that, not everybody writes on all the expected behavior for all the functionality that an RFC has been put up for. And in those cases, you just need to see what makes the most sense whether it's about core feature.

Ignace Nyamagana Butera 4:40

What is the best way to report a bug? Okay, you have to go to bugs.php.net, I suppose. Yes. But apart from that, what is the best way to report a bug?

Derick Rethans 4:51

As you said, PHP is issue tracker is bugs.php.net. It tells you to fill in your problem, your expected behavior and what you actually get out, what is always really important for people to be able to fix an issue and to find out whether there is an issue to begin with, because that's not always the case either of course, is always to have a short reproducible script that reproduces your problem. And by short, that means it the short you can get it. 10 lines at most for most syntax features who probably do the job. In some cases, if it's a bug for a database related system, then of course, there's going to be some database setup necessary for it. But if it's just syntax, then a short script that reproduces the problem that shows what goes wrong, is really important. And of course, it's also important to say what it did, and what you expected it to do. Also, don't lie about your PHP version, because in some cases, people try to report a bug with a higher PHP version than they're actually using, which is kind of frustrating at times.

Ignace Nyamagana Butera 5:52

I guess that yeah, if we report something that didn't work in PHP five, but it was fixed in PHP 7.2 or PHP 7.3 everybody loses a little bit of time.

Derick Rethans 6:02

And in some cases people find a bug report for, say, PHP 7.4.1. Right, and we're currently at 7.4.6. We will always ask them first to upgrade if they can, because upgrading PHP should take a lot less time than trying to reproduce and fix a problem that has already been fixed.

Ignace Nyamagana Butera 6:20

And what is the strategy between the release of each version of PHP and the bug fix? Does PHP wait for all the bug fixes to be done and then a release is made. Or if for instance, I report a bug like today before a release is scheduled, then this bug will be skipped from the next release and will be tackled after

Derick Rethans 6:46

Every minor version of PHP, be at seven two, seven three, or seven four a moment, has a release every four weeks. Two weeks and two days before a release gets made, we make our release candidates. Everything that has made it in the release candidate will make it into the release. If in between the release candidate gets created and the final release, if bugs get fixed, unless they are really critical, they will make it into that release. But we'll have to wait until the next cycle. So we don't necessarily wait for all the bugs to be fixed before we make a release. Now, there is an exception here, and that is for security bugs. If you find security bugs, they don't end up in a normal PHP seven four branch. They get committed to a security repository that very few people have access to. And these security bug fixes. They get merged into the release branches two days before the release comes out. They don't end up in a release candidate builds because we don't want people 16 days to be able to exploit security bugs if they are remote exploitable, for example.

Ignace Nyamagana Butera 7:53

And can security bugs, or critical bugs push a release?

Derick Rethans 7:59

Technically, yes. If somebody ends up finding, like a remote exploitable bug in PHP, then there will be an emergency release for them. But I can't remember the last time we had to do that.

Ignace Nyamagana Butera 8:10

I remember, like one or two years ago, there was a bug that was going from the bugtrack to the internal mailing list and coming back again to the bugtrack, because there was some kind of indecision to know if it is a bug, or if it should be a feature. How is this possible?

Derick Rethans 8:32

We don't really have a set method for doing this. But our bug tracker isn't the most advanced system in the world. And sometimes it just makes sense to trash out a discussion over email on our PHP internals mailing lists, or sometimes these discussions happen on other chat channels as well I'm sure, just to go through to see what's the case. And sometimes if it is hard to take a decision while there's a bug, then it is always a good idea that more PHP core developers have a look at it and see what's going on there. So sometimes it makes it easier if that's discussed on the mailing list, then in the bug tracker.

Ignace Nyamagana Butera 9:04

Is it possible that for instance, someone submit an RFC. And then during the course of discussion of this RFC, it becomes clear that this is not an RFC, but more of a bug fix.

Derick Rethans 9:16

I don't think I can think of an example here actually.

Ignace Nyamagana Butera 9:19

I remember one example.

Derick Rethans 9:21

Okay.

Ignace Nyamagana Butera 9:23

Because I think it was yeah two years ago about the behavior of the CSV escape character. And I remember at some point, it was suggested to be an RFC. And because of the amount of background compatibility breaks, it was better to treat it like a bug. But I remember when between the bug tracker and the note sufficient there was a whole discussion to exactly being able to say: Okay, this is a bug. And this is an RFC and it was really not, it was a call at the end saying, okay, we will treat it like an RFC, and we will change the way the escape corrector works today. But it won't be as impacting as if it was an RFC that introduced a completely new behavior

Derick Rethans 10:12

CSV is a very difficult format, because everybody slightly implements a standard in a different way. And the way how it originally got implemented in PHP for reading CSV files was done in a very different way than for example, what Microsoft products would create. I mean, it has to do with escaping, if I remember correctly. And I mean, what do you decide, right? I mean, since then Microsoft have made a specification for this. And of course, what we then want to do in PHP is to make sure that we support a specification, but by doing so, we will then break previous behavior, and that is always a really difficult decision to do, right. If it is very clear that it is a bug, then we don't mind changing PHP, even though that could technically break people's code. But if it's unsure or whether it's based on a subjective decision, then that makes it a lot harder to write because we can't definitively say that, yeah, we have a bug here. But if we look at other codebase out there, so many people rely on this. So is the old behavior bug, or is it a feature in PHP? I mean, these things, you have to take one by one, and it's very hard to decide on what is what is a feature, and what is the bug in this case.

Ignace Nyamagana Butera 11:22

I think another subject that comes with bugs is people should be able to fix them. But I suppose that every one of us has a work and who can fix those bugs?

Derick Rethans 11:33

Technically, everybody who has time and know C code could fix a bug. PHP is an open source projects. Our repositories are available on GitHub, or on git.php.net, which is our source of truth, although most people submitted bug fixes against the GitHub repository because it makes it easier to review them and comment on pull requests, for example. But it's open for everybody. It's the same thing about triaging bugs. Trying to find out if the bugs that are actually reported are actual bugs and the bugs.php.net website has in the top right hand corner, it has a random link. And if you click that you get a random bug that hasn't been resolved yet. If somebody, if any of the listeners, or maybe you, are interested in looking at these bugs or wanting to attempt to fix them, click random and see what happens. Maybe you get something interesting, maybe because something really complicated, but in any case, it's possible for everybody to fix a bug. They will get reviewed. For a good enough bug fix it will get merged.

Ignace Nyamagana Butera 12:31

People are usually thinking when they think about open source nowadays they think about semver and people may think that if they look at the versioning of PHP, then they have an idea of it is a patch release, it is a bug release, it is a feature release. How is this related to bugs and how is it versioning of PHP working?

Derick Rethans 12:53

PHP's versions number consists out of three numbers. At the moment, we are the latest version is 7.4.6. The six is your bug fix release. In bug fix releases, there will not be any new functionality. Unless there are very minor, small contained parts in extensions. We tend not to want to have these. And unless you can make a good case for it, it's unlikely to happen. But it isn't unheard of. An example I think I can remember is that open SSL, added a bunch of new API's in there, and other technically new function functions in PHP, they sort of had to be supported, because as part of making sure that you could run the latest version of open SSL or something like that, but that being an exception there. Now, the middle number, traditionally, in semver, is there for features, right, you've bump the middle number, the middle digit, if you have new features, and that is the same in PHP. What we don't really have is a major number that indicates that we are going to break things. The major number in PHP is mostly a marketing number. So at the moment, we have PHP seven four out there. We don't have PHP eight zero next. But that is pretty much a PHP seven five, but with additional functionality that we find important enough to bump the major version from seven to eight for. Having said that, we do have a rule that we don't remove functionality, unless we bump the major number. For example, from five to seven, or from seven to eight. So there will be in the course of time, we might deprecate functionality, we don't tend to remove that until we bump the major number. And you also see that if the major number gets increased, that there is potentially more effort in removing or deprecating more functionality that would otherwise do say for example, it changed from 7.3.0 to 7.4.0. But it doesn't mean that we don't bump major numbers so that we can break all the things for example. So I think the PHP protect tries to, we don't always succeed of course, try to never break people's code. Unless it's a bug fix

Ignace Nyamagana Butera 15:03

That was it for my questions.

Derick Rethans 15:06

Maybe I have some questions for you now. I think it is good to talk about these issues. What are you most surprised with in the way how the PHP process handles bugs and bug reports?

Ignace Nyamagana Butera 15:15

The first thing is, like I say, I've been coding in PHP for more than 15 years, but I only started really to report bugs once I start doing some open source project. Because before I think, and I think it's the majority of people, it's like, yes, there is a bug, oh it's something for PHP, or for any kind of language. I'm not the maintainer. So it's a bug, someone else will report it not to me. Since I've changed because I'm doing myself some open sourcing. I'm like, hey, if I found a bug, I think the best way to resolve that bug is first, to report it and to report it correctly, to the project, to the language or to whatever has that bug. And once you've made this change of how you think about the language, then you start to ask yourself, okay, how can I do it the most efficient way so that the bug get reported? And then the bug can get tackled by the people who can.

Derick Rethans 16:19

Yeah, and the start of that, as you say's, always send us a bug report or sent your favorite open source project a bug report.

Ignace Nyamagana Butera 16:26

Exactly.

Derick Rethans 16:27

I can sort of see where you're coming from. Because I can understand that if you're just in an agency, for example, and the only thing, the only thing you have to do is to make sure that your project is done on time. You can't necessarily wait for the bug to be fixed in PHP anyway, because the product needs to be done by tomorrow or yesterday. And you're going to have to find a workaround you issue in that case anyway. And then you spending time reporting the bug will just takes you time and you don't have time for that, for example. But of course, if you do that, then everybody else that runs into this bug will have to come up with a workaround, and that means you're all end up wasting lots of time.

Ignace Nyamagana Butera 17:04

I remember I had a small story. In one of my previous jobs, someone came to me and we're talking about something and he said: Oh, but there is no constant on the DateTimeImmutable. That's very sad. And I said: no, there is because I remember I submitted the bug, and it was tackled. And now the constants are on the interface. So DateTimeImmutable has the constant and was like: Oh, yeah, but I didn't know. And I was; it was reported and someone use it. And if you don't report it, then maybe in two years, you will ask yourself the same question. Indeed, it takes time. Between the moment it is reported the moment it is tacked, because people need to have time to resolve the issue. But if you don't do the first step, which is reporting it correctly, then it will never be solved.

Derick Rethans 17:53

And by correctly that also means doing in the PHP bug tracker and not complaining on Twitter.

Ignace Nyamagana Butera 17:58

Exactly. Exactly.

Derick Rethans 18:02

Of which I see quite a bit of for Xdebug for example. Thank you very much for taking the time to talk to me, or I should say thank you very much for taking the time to interview me to talk about bugs today. I hope you enjoyed this.

Ignace Nyamagana Butera

Thank you for having me. And hopefully we'll meet again.

Derick Rethans

I'm looking forward to that. Thanks very much.

Ignace Nyamagana Butera 18:21

Thank you.

Derick Rethans 18:23

Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week.