PHP Internals News: Episode 55: Dealing with Bugs

PHP Internals News: Episode 55: Dealing with Bugs

In this episode of "PHP Internals News" I chat with Ignace Nyamagana Butera (Twitter, GitHub, Blog) about how the PHP project handles bugs and bug reports.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 55. Today I'm talking with Ignace Nyamagana Butera after he'd asked me on Twitter, how PHP deals with bugs. A few episodes ago, I did a Q&A session about the RFC process. And this time again, we'll have Ignace Nyamagana Butera asking the questions. Would you please introduce yourself?

Ignace Nyamagana Butera 0:46

Hello, everyone. Hello, Derick. My name is Ignace Nyamagana Butera, but you can call me Nyamsprod. I've been a PHP developer for around 15 years now. Currently, I'm working as a software developer, and technical lead in the internet content provider agency. When I have free time, I'm doing some open source, I have a couple of projects that you may have heard of, like, league CSV and league URI. I created them and I am currently maintaining them.

Derick Rethans 1:23

Yeah, as I said, it is not me asking the questions as you this time. So I think we should jump straight in actually.

Ignace Nyamagana Butera 1:30

So my first question will be somehow really simple, because we are talking about bugs. And I was wondering if we had some statistics about bugs in PHP.

Derick Rethans 1:44

Though there are some statistics. I mean, it's not really easy to get that information out of our bug system. But just having had a look, it's about on average, maybe one bug a day gets reported at the moment or is nearly 80,000 bugs in the bug system of course, not all of these are closed, some of them are open, but the majority of them are closed.

Ignace Nyamagana Butera 2:07

Do bugs from the EOL PHP still being taken into account or we just say: okay, these bugs for instance, are for PHP five, will no longer look at them.

Derick Rethans 2:18

If it's a bug, unless it's a security bug fix, we won't look at them for unsupported PHP versions. So at the moment, PHP, seven three, and seven four are still supported. So those bugs will of course look at, if it's a security bug, we only will go back to PHP seven two. If it's reported to any older version and seven two for example, seven one or seven zero, or even PHP four or five, which does happen occasionally, we'll tell them to upgrade first because we won't spend time doing that.

Ignace Nyamagana Butera 2:47

Because I manage and maintain open source project. I know that PHP as a language is used everywhere and you can have multiple reports. First thing first, what is a bug? Because there are multiple definition of it.

Derick Rethans 3:03

And I'm sure if you asked 12 people, you get 13 definitions. I think it is unexpected behavior of something that is documented. So if something is documented do this, and it does something else, or it does something really wrong like crash your program, then that will be a bug.

Ignace Nyamagana Butera 3:21

What is the source of truth? Is it the PHP documentation? Is it the PHP specification language, what is the source of truth? Nothing. Okay. This is expected behavior because it is documented, or how does it work?

Derick Rethans 3:38

For most of the syntax, it's what the source does. And of course, you always find edge case. And I don't have a good example right now. For anything that the syntax, I mean, documentation and behavior should absolutely always work the same. If it doesn't, it's likely going to be a bug in the documentation. If you for example, look at other functionality like in an extension, there is almost as likely that the documentation is sometimes wrong than it is that the code's behavior is wrong. In that case, we need to have a good look at what what the expected behavior should have been. Now, with all the new features that have been put in, since we have the RFC process, pretty much anything that the RFC describes how it should work, is how the feature should work. And if it doesn't, that pretty much means there's a bug. Having said that, not everybody writes on all the expected behavior for all the functionality that an RFC has been put up for. And in those cases, you just need to see what makes the most sense whether it's about core feature.

Ignace Nyamagana Butera 4:40

What is the best way to report a bug? Okay, you have to go to bugs.php.net, I suppose. Yes. But apart from that, what is the best way to report a bug?

Derick Rethans 4:51

As you said, PHP is issue tracker is bugs.php.net. It tells you to fill in your problem, your expected behavior and what you actually get out, what is always really important for people to be able to fix an issue and to find out whether there is an issue to begin with, because that's not always the case either of course, is always to have a short reproducible script that reproduces your problem. And by short, that means it the short you can get it. 10 lines at most for most syntax features who probably do the job. In some cases, if it's a bug for a database related system, then of course, there's going to be some database setup necessary for it. But if it's just syntax, then a short script that reproduces the problem that shows what goes wrong, is really important. And of course, it's also important to say what it did, and what you expected it to do. Also, don't lie about your PHP version, because in some cases, people try to report a bug with a higher PHP version than they're actually using, which is kind of frustrating at times.

Ignace Nyamagana Butera 5:52

I guess that yeah, if we report something that didn't work in PHP five, but it was fixed in PHP 7.2 or PHP 7.3 everybody loses a little bit of time.

Derick Rethans 6:02

And in some cases people find a bug report for, say, PHP 7.4.1. Right, and we're currently at 7.4.6. We will always ask them first to upgrade if they can, because upgrading PHP should take a lot less time than trying to reproduce and fix a problem that has already been fixed.

Ignace Nyamagana Butera 6:20

And what is the strategy between the release of each version of PHP and the bug fix? Does PHP wait for all the bug fixes to be done and then a release is made. Or if for instance, I report a bug like today before a release is scheduled, then this bug will be skipped from the next release and will be tackled after

Derick Rethans 6:46

Every minor version of PHP, be at seven two, seven three, or seven four a moment, has a release every four weeks. Two weeks and two days before a release gets made, we make our release candidates. Everything that has made it in the release candidate will make it into the release. If in between the release candidate gets created and the final release, if bugs get fixed, unless they are really critical, they will make it into that release. But we'll have to wait until the next cycle. So we don't necessarily wait for all the bugs to be fixed before we make a release. Now, there is an exception here, and that is for security bugs. If you find security bugs, they don't end up in a normal PHP seven four branch. They get committed to a security repository that very few people have access to. And these security bug fixes. They get merged into the release branches two days before the release comes out. They don't end up in a release candidate builds because we don't want people 16 days to be able to exploit security bugs if they are remote exploitable, for example.

Ignace Nyamagana Butera 7:53

And can security bugs, or critical bugs push a release?

Derick Rethans 7:59

Technically, yes. If somebody ends up finding, like a remote exploitable bug in PHP, then there will be an emergency release for them. But I can't remember the last time we had to do that.

Ignace Nyamagana Butera 8:10

I remember, like one or two years ago, there was a bug that was going from the bugtrack to the internal mailing list and coming back again to the bugtrack, because there was some kind of indecision to know if it is a bug, or if it should be a feature. How is this possible?

Derick Rethans 8:32

We don't really have a set method for doing this. But our bug tracker isn't the most advanced system in the world. And sometimes it just makes sense to trash out a discussion over email on our PHP internals mailing lists, or sometimes these discussions happen on other chat channels as well I'm sure, just to go through to see what's the case. And sometimes if it is hard to take a decision while there's a bug, then it is always a good idea that more PHP core developers have a look at it and see what's going on there. So sometimes it makes it easier if that's discussed on the mailing list, then in the bug tracker.

Ignace Nyamagana Butera 9:04

Is it possible that for instance, someone submit an RFC. And then during the course of discussion of this RFC, it becomes clear that this is not an RFC, but more of a bug fix.

Derick Rethans 9:16

I don't think I can think of an example here actually.

Ignace Nyamagana Butera 9:19

I remember one example.

Derick Rethans 9:21

Okay.

Ignace Nyamagana Butera 9:23

Because I think it was yeah two years ago about the behavior of the CSV escape character. And I remember at some point, it was suggested to be an RFC. And because of the amount of background compatibility breaks, it was better to treat it like a bug. But I remember when between the bug tracker and the note sufficient there was a whole discussion to exactly being able to say: Okay, this is a bug. And this is an RFC and it was really not, it was a call at the end saying, okay, we will treat it like an RFC, and we will change the way the escape corrector works today. But it won't be as impacting as if it was an RFC that introduced a completely new behavior

Derick Rethans 10:12

CSV is a very difficult format, because everybody slightly implements a standard in a different way. And the way how it originally got implemented in PHP for reading CSV files was done in a very different way than for example, what Microsoft products would create. I mean, it has to do with escaping, if I remember correctly. And I mean, what do you decide, right? I mean, since then Microsoft have made a specification for this. And of course, what we then want to do in PHP is to make sure that we support a specification, but by doing so, we will then break previous behavior, and that is always a really difficult decision to do, right. If it is very clear that it is a bug, then we don't mind changing PHP, even though that could technically break people's code. But if it's unsure or whether it's based on a subjective decision, then that makes it a lot harder to write because we can't definitively say that, yeah, we have a bug here. But if we look at other codebase out there, so many people rely on this. So is the old behavior bug, or is it a feature in PHP? I mean, these things, you have to take one by one, and it's very hard to decide on what is what is a feature, and what is the bug in this case.

Ignace Nyamagana Butera 11:22

I think another subject that comes with bugs is people should be able to fix them. But I suppose that every one of us has a work and who can fix those bugs?

Derick Rethans 11:33

Technically, everybody who has time and know C code could fix a bug. PHP is an open source projects. Our repositories are available on GitHub, or on git.php.net, which is our source of truth, although most people submitted bug fixes against the GitHub repository because it makes it easier to review them and comment on pull requests, for example. But it's open for everybody. It's the same thing about triaging bugs. Trying to find out if the bugs that are actually reported are actual bugs and the bugs.php.net website has in the top right hand corner, it has a random link. And if you click that you get a random bug that hasn't been resolved yet. If somebody, if any of the listeners, or maybe you, are interested in looking at these bugs or wanting to attempt to fix them, click random and see what happens. Maybe you get something interesting, maybe because something really complicated, but in any case, it's possible for everybody to fix a bug. They will get reviewed. For a good enough bug fix it will get merged.

Ignace Nyamagana Butera 12:31

People are usually thinking when they think about open source nowadays they think about semver and people may think that if they look at the versioning of PHP, then they have an idea of it is a patch release, it is a bug release, it is a feature release. How is this related to bugs and how is it versioning of PHP working?

Derick Rethans 12:53

PHP's versions number consists out of three numbers. At the moment, we are the latest version is 7.4.6. The six is your bug fix release. In bug fix releases, there will not be any new functionality. Unless there are very minor, small contained parts in extensions. We tend not to want to have these. And unless you can make a good case for it, it's unlikely to happen. But it isn't unheard of. An example I think I can remember is that open SSL, added a bunch of new API's in there, and other technically new function functions in PHP, they sort of had to be supported, because as part of making sure that you could run the latest version of open SSL or something like that, but that being an exception there. Now, the middle number, traditionally, in semver, is there for features, right, you've bump the middle number, the middle digit, if you have new features, and that is the same in PHP. What we don't really have is a major number that indicates that we are going to break things. The major number in PHP is mostly a marketing number. So at the moment, we have PHP seven four out there. We don't have PHP eight zero next. But that is pretty much a PHP seven five, but with additional functionality that we find important enough to bump the major version from seven to eight for. Having said that, we do have a rule that we don't remove functionality, unless we bump the major number. For example, from five to seven, or from seven to eight. So there will be in the course of time, we might deprecate functionality, we don't tend to remove that until we bump the major number. And you also see that if the major number gets increased, that there is potentially more effort in removing or deprecating more functionality that would otherwise do say for example, it changed from 7.3.0 to 7.4.0. But it doesn't mean that we don't bump major numbers so that we can break all the things for example. So I think the PHP protect tries to, we don't always succeed of course, try to never break people's code. Unless it's a bug fix

Ignace Nyamagana Butera 15:03

That was it for my questions.

Derick Rethans 15:06

Maybe I have some questions for you now. I think it is good to talk about these issues. What are you most surprised with in the way how the PHP process handles bugs and bug reports?

Ignace Nyamagana Butera 15:15

The first thing is, like I say, I've been coding in PHP for more than 15 years, but I only started really to report bugs once I start doing some open source project. Because before I think, and I think it's the majority of people, it's like, yes, there is a bug, oh it's something for PHP, or for any kind of language. I'm not the maintainer. So it's a bug, someone else will report it not to me. Since I've changed because I'm doing myself some open sourcing. I'm like, hey, if I found a bug, I think the best way to resolve that bug is first, to report it and to report it correctly, to the project, to the language or to whatever has that bug. And once you've made this change of how you think about the language, then you start to ask yourself, okay, how can I do it the most efficient way so that the bug get reported? And then the bug can get tackled by the people who can.

Derick Rethans 16:19

Yeah, and the start of that, as you say's, always send us a bug report or sent your favorite open source project a bug report.

Ignace Nyamagana Butera 16:26

Exactly.

Derick Rethans 16:27

I can sort of see where you're coming from. Because I can understand that if you're just in an agency, for example, and the only thing, the only thing you have to do is to make sure that your project is done on time. You can't necessarily wait for the bug to be fixed in PHP anyway, because the product needs to be done by tomorrow or yesterday. And you're going to have to find a workaround you issue in that case anyway. And then you spending time reporting the bug will just takes you time and you don't have time for that, for example. But of course, if you do that, then everybody else that runs into this bug will have to come up with a workaround, and that means you're all end up wasting lots of time.

Ignace Nyamagana Butera 17:04

I remember I had a small story. In one of my previous jobs, someone came to me and we're talking about something and he said: Oh, but there is no constant on the DateTimeImmutable. That's very sad. And I said: no, there is because I remember I submitted the bug, and it was tackled. And now the constants are on the interface. So DateTimeImmutable has the constant and was like: Oh, yeah, but I didn't know. And I was; it was reported and someone use it. And if you don't report it, then maybe in two years, you will ask yourself the same question. Indeed, it takes time. Between the moment it is reported the moment it is tacked, because people need to have time to resolve the issue. But if you don't do the first step, which is reporting it correctly, then it will never be solved.

Derick Rethans 17:53

And by correctly that also means doing in the PHP bug tracker and not complaining on Twitter.

Ignace Nyamagana Butera 17:58

Exactly. Exactly.

Derick Rethans 18:02

Of which I see quite a bit of for Xdebug for example. Thank you very much for taking the time to talk to me, or I should say thank you very much for taking the time to interview me to talk about bugs today. I hope you enjoyed this.

Ignace Nyamagana Butera

Thank you for having me. And hopefully we'll meet again.

Derick Rethans

I'm looking forward to that. Thanks very much.

Ignace Nyamagana Butera 18:21

Thank you.

Derick Rethans 18:23

Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week.


PHP Internals News: Episode 54: Magic Method Signatures

PHP Internals News: Episode 54: Magic Method Signatures

In this episode of "PHP Internals News" I chat with Gabriel Caruso (Twitter, GitHub, LinkedIn) about the "Ensure correct signatures of magic methods" RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 54. Today I'm talking with Gabriel Caruso about his ensure correct signatures of magic methods RFC. Hello Gabriel, would you please introduce yourself?

Gabriel Caruso 0:37

Hello Derick and hello to everyone as well. My name is Gabriel. I'm from Brazil, but I'm currently in the Netherlands. I'm working in a company called Usabila, which is basically a feedback company. Yeah, let's talk about this new RFC for PHP eight.

Derick Rethans 0:52

Yes, well, starting off at PHP eight. Somebody told me that you also have some other roles to play with PHP eight.

Gabriel Caruso 0:59

Yeah, I think last week I received the news that I'm going to be the new release manager together with Sara. We're going to basically take care of PHP eight, ensuring that we have new versions, every month that we have stable versions every month free of bugs, we know that it's not going to happen.

Derick Rethans 1:17

That's why there's a release cycle with alphas and betas.

Gabriel Caruso 1:20

Yeah.

Derick Rethans 1:21

I've been through this exactly a year early, of course, because I'm doing a seven four releases.

Gabriel Caruso 1:25

Oh, nice. Yeah. So I'm gonna ask a lot of questions for you.

Derick Rethans 1:29

Oh, that's, that's fine. It's also the role of the current latest release manager to actually kickstart the process of getting the PHP, in this case, PHP eight release managers elected. Previously, there were only very few people that wanted to do it. So in for the seven four releases it was Peter and me. But in your case, there were four people that wanted to do it, which meant that for the first time I can ever remember we actually had to hold some form of election process for it. That didn't go as planned because we ended up having a tie twice, which was interesting. So we had to run a run off election for the second person between you and Ben Ramsey, that's going to go continuing for you for the next three and a half years likely.

Gabriel Caruso 2:11

Yep.

Derick Rethans 2:12

So good luck with that.

Gabriel Caruso 2:13

Thank you. Thank you very much.

Derick Rethans 2:15

In any case, let's get back to the RFC that we actually wanted to talk about today, which is the ensure correct signatures of magic methods RFC. What are these magic methods?

Gabriel Caruso 2:24

So PHP, let's say out of the box, gives the user some magic methods that every single class have it. We can use that those methods for anything, but basically, what magic methods are are just methods that are called by PHP when a given action happens to the class. So for example, if a class is being constructed, then the construct magic method is going to be called. If I'm calling serialize function, then the magic method serialize as per PHP seven four or PHP eight. I don't remember, so this is basically what magic methods are, are methods that PHP hook into the classes and then once a certain action happened with the class, then PHP is going to call those magic methods in something magic, so to speak is going to happen.

Derick Rethans 3:13

And other options are like underscore underscore get, and underscore underscore set.

Gabriel Caruso 3:17

We have, we have a lot.

Derick Rethans 3:19

Exactly, what do people tend to use these magic methods for?

Gabriel Caruso 3:22

So that's something interesting. As the magic method is called by a number of actions we can use, for example, for let's let's get the example of ORM for example, Doctrine or Eloquent or whatever one. Let's say I'm a maintainer of that library. I don't know what fields do you have in your database. So when I'm porting, when I'm doing the translation, what it can do is map in a property, all those columns and values that I have in the database. And then when you instantiate your entity and you try to access a variable that is does not exist, then we're going to go to a magic method in this case is get, as I said, and I'm going to say okay, is not set in the class, but is mapped in the entity that I have. So this is one case, we also have the case for testing your you have, for example, the famous PHP Unit test framework, every time that a test case is called with all those methods is starting in with test, the call magic method is invoked. And then you can perform whatever action you have. You also have middlewares and the examples go go even further

Derick Rethans 4:32

In the title of RFC you have the word signature, what is the signature?

Gabriel Caruso 4:37

All the attributes that our method can have. So for example, the name of a method is its signature, what does it return? What parameters does it take? And also what modifiers so for example, is it static or not? Is it public, private or protected? So all this information together in usually is one line in PHP. So for example, private static MyMethod, that receives a string and returns a Boolean. There you go. This is the signature of my method

Derick Rethans 5:06

Because some of these magic methods have been in PHP for a long long time. Back in the time where we didn't have argument types or return types or perhaps not even static. All the way back from the past PHP hasn't really done anything with signatures because they've simply didn't exist. At the moment which signature checks this PHP already do?

Gabriel Caruso 5:26

I don't remember a by the RFC but I think was introduced together with the scalar type RFC. But only constructors and destructors until PHP seven four, those two only magic methods were being checked. If they have none return type, not even void, just no return type. But in PHP eight, we're gonna have the new stringable interface and then every single toString magic method. If it is typed, this is very important if it is typed it needs to be a string and these are the only from the 17 that we have only three in PHP 8 are being checked.

Derick Rethans 6:01

PHP seven four.

Gabriel Caruso 6:02

Yeah, in PHP seven four only two and then PHP eight, we have the new toString.

Derick Rethans 6:07

But this RFC suggesting to change that of course.

Gabriel Caruso 6:10

yeah.

Derick Rethans 6:11

What's the reason why you want to extend these checks to the other magic methods?

Gabriel Caruso 6:14

That brings me back how I figured out that. I was looking at some bugs, because we have the https://bugs.php.net, where we centralized all the bugs of PHP. Then there is a bug report explaining in complaining exactly about that. Like, I can't hide my magic method. Back in the days I can say, for example, that my tostring method is going to return an integer or a Boolean. That makes no sense. And then I was like, yeah, makes makes no sense. We need to fix that out and then I start to search how do we type that? How what types do we have and then I was like, we can't in PHP eight, because this is going to be a new major version. So we are allowed to at least vote for do that. We can check if someone is using types, we can check those types. We are not going to force, we are not going to require, we're not going to evaluate even run static analysis. Nope, we're going to simply check. Okay. Are you saying that this get magic method is going to return anything? Okay, that's okay. Oh, but I want to my guess is that you specifically return a string. That's also okay. As to how to pronounce that liskov mistook principle, right?

Derick Rethans 6:36

The liskov substitution principle.

Gabriel Caruso 7:26

Yeah. And so this is what we're going to basically do with this RFC, there's going to be voted. We're going to simply check if you're using the right types, because, in my opinion, magic methods are a foundation in PHP. As we have theses methods across different code bases across different projects from different behaviours, at least when I'm looking at that code. Okay, I'm looking at this magic method. I know what parameters does it take. I know what return does it have. This is worth less tab to the bug are trying to understand what is happening. Because today maybe I'm debugging a toString method there is return an integer. And I'm like, okay, this is the bug, it's supposed to return a string. But once you ensure those all those signatures, is one less bug that we're gonna have in production.

Derick Rethans 8:17

When are these signatures being ensured?

Gabriel Caruso 8:19

It's not at compile time because he does not have a compile time. But he's when the Zend machine is compiling the code, we have a very specific method that is checking all the modifiers. So for example, the signature that we mentioned before so all the magic methods needs to be public. This has been checked, for example, they callStatic magic method needs to be static. So this has also been checked. And then I'm extending how do we check for signatures for param types and also for return types. So during compilation of the Zend VM.

Derick Rethans 8:52

Taking as example callStatic in the RFC, I see that the name has to be a string and the arguments has to be an array. What happens if you use a different type there?

Gabriel Caruso 9:01

So nowadays if you use a different type that's allowed. So if you say there, you're going to receive an integer, and you're going to receive a string. This is allowed today. And this is what I mentioned about when you are debugging or analyze different code bases, you're going to be like why in the documentation says that we need to receive a string and an array, and there's this specific code base is receiving a string and an integer. So this is what kinds of mismatch I want to avoid. Of course, when using types, because we also know that PHP in some projects does not use types. And that's perfectly fine. If you're not using types, I'm not going to ask you, hey, you need to type those magic methods. Well, what I'm going to do is okay, you're using types and I need to make sure they're using right otherwise this is going to be a mess.

Derick Rethans 9:47

If you type it; say use an integer for the name of underscore underscore get, will give you a warning or a compile error, or parse error? What what kind of feedback which you get back from that?

Gabriel Caruso 9:59

While you are running your code, as soon as that class get referenced, we're going to check. Is not when is initiated, when is not when is called, as soon as I think the autoload detects that class is gonna parse, is going to identify, and then is going to compile and during the compile time that we mentioned. We're going to identify that. So it's going to be early in the stages. Perhaps as soon as you run something or you would upset me, you're going to have that feedback saying: hey, this is not compatible with what we are expecting.

Derick Rethans 10:32

Is that a warning or type error?

Gabriel Caruso 10:34

It's going to be a fatal error, because this is what we are constantly returning with the destructors and constructors.

Derick Rethans 10:41

Yeah, we alluded to mixed already a little bit and the RFC mentioned mixed a few times, of course mixes in the type and PHP yet. So what do you want to do about that?

Gabriel Caruso 10:51

Today we are 11th of May of 2020. Right now we have an RFC voting in PHP to introduce the mixed type. I'm not going to say if I agree or disagree, it's being voted. If that RFC gets accepted then I have already talked with the authors of the that RFC, I'm going to wait until they merge into master. I'm going to rebase and readapt to my RFC, to have those mixed types. And there we go PHP eight probably can have mixed, and probably can already have the usage of mixed in the magic methods. So either No, I'm gonna need to wait for the end of their RFC. If it's approved, there go I need to rebase my PR. In the other case, we are going to keep as comments because we can't ensure that in the compile time with the VM.

Derick Rethans 11:41

At the moment, it looks like that vote will and in May 21. The current votes are 35 to six for passing. So it looks like that will go through

Unknown Speaker 11:50

And then I need to rush because we have the upcoming feature freeze of PHP eight. So I need to make sure that I start to vote and implement my RFC before that time.

Derick Rethans 12:00

Feature freeze should be by the end of July. So I think you have plenty of ime pfor that. And of course you have a release manager, you can make an exception. That's how that works. Usually adding extra checks will have impact to existing code. Is there much impact to existing code here as well?

Gabriel Caruso 12:18

That was the interest question that I made myself. Okay, I'm going to touch the magic methods of PHP. I'm going to break some code in an issue identified those breaking changes in an each map in the RFC. How do I map across many projects, many libraries, many PHP codes out there? How do I do that? I remember that Nikita back in his RFC about the parenthesis origin, like how do we present this ordering and yada yada yada. He made a script, where he went through I think was the top thousand or top 10,000 packages. On packagist, that is the official composer package provider and he identified everything, and ask myself how he did that. And actually was very easy. He just cloned other repositories. He instantiate a new PHP parser instance that is his magic parser. That is behind PHP Stan, is behind psalm, is behind a lot of infection, a lot of big projects, where you analyze the code. So you have a code base where you can analyze and say: Do I have magic methods wrong? And then I run this script, identify, I think six or seven types that were not perfect. Three of them. I have already submitted a request because we're in PHP Unit and I said to Sebastian: hey, this actually is not right. Because I'm proposing this RFC, he was like: Okay, perfect, let's merge it. And the other cases are the cases that I mentioned. For example, with get. Get, you need to return mixed but by the LSP, you can nail down to an integer or a string. So there you go, at least in the top 10,000 packages of composer is not going to be a breaking change. But of course, it's going to be breaking change for people that I can't map. So this is why it's mentioned the RFC that if you're using types with magic methods wrong, we're going to warn you.

Derick Rethans 14:13

But at least it's an easy thing to check for. Because even running all your files through PHP minus L should catch it.

Gabriel Caruso 14:20

Yeah, there you go.

Derick Rethans 14:22

So it's a very easy to check for something. You provided a link to Nikita's script where he checks for those ternairies, do you have a version of your own script available as well?

Gabriel Caruso 14:33

That's interesting. I thought the RFC was updated. So I'm going to update the RFC, because I do have the script locally.

Derick Rethans 14:39

Then I can link to it for the podcast as well.

Gabriel Caruso 14:41

Okay, perfect.

Derick Rethans 14:42

In the future, are you thinking of extending checks to a few more things?

Gabriel Caruso 14:46

So this is something that I fought about this RFC, like how much you want to break and explode people's code. And I think starting with checking types in the signature is the first step. The next step is to actually check the return type. We do that with toString. So for example, although you have type right for maybe, some logic or something is wrong, you're returning an integer. There is a check before the actual type saying you're supposed to return a string you're return an integer. And actually, there is a check in the magic method saying this magic method was supposed to return a string. I think is gonna break even more code because then it's something that I can't measure. So I was like: Okay, let's first start with types and then we can give it next step that is: okay, inside this method, what is being returned, okay, is something different from the signature: explode. You're returning something that I was not supposed to return. But this is not a fight that I'm going to pick. So I leave it up for the next major version of PHP or whatever.

Derick Rethans 15:49

Wouldn't PHP's strict versus weak type mechanism already catch these things. So from debugInfo, if you would type that as returning an array, and then you end up returning an object, which is not necessarily wrong, just not what you expected. PHP's return type checking mechanism should already catch that for you.

Gabriel Caruso 16:13

If you have a magic method typed. If it's not typed, so we can say that some efforts do have that check. And then we're going to expand when we don't have types in the signature.

Derick Rethans 16:24

That's clear now. Do you have anything else to add?

Gabriel Caruso 16:27

The only thing that I want to add that is, I have created another RFC, and this is something that I always tell everyone that is easy to do; is not impossible. Anyone can go there, identify a bug or catch a bug report and then try to fix it. And this is what I'm doing. Like I'll do them to release many of PHP eight. I'm also fixing bugs, improving documentation and everything else. This is something that I try to do and share with everyone. So everyone can also be the next one contributor to the to PHP and it's evolution.

Derick Rethans 16:57

This RFC isn't out for voting yet. You set you want to sort of wait until mixed gets passed or not. What's the reception been so far?

Gabriel Caruso 17:05

So I asked a couple of key members of the PHP community, both internal and external people. They agree, they said that the right approach is to first check for the signature, because if someone is already using types, that project is type friendly, so we can at least play with that. But if someone is not typing, then this is a bigger fight. And then we're going to talk about that in the future.

Derick Rethans 17:29

Thank you, Gabriel for taking the time this morning to talk to me. I've learned a few more things about this RFC, so that's always good to know. And again, congratulations of being the PHP eight release manager together with Sara.

Gabriel Caruso 17:41

Thank you very much. Also thank you for inviting me for this new podcast is amazing. Always listen to all these famous people of PHP that talked with you. And I'm like, Whoa, Derick has invited me this is going to be so much fun. Thank you very much.

Derick Rethans 17:55

Thanks for listening to this installment of PHP internals news, the weekly podcast dedicated to demystify the development of the PHP language, I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to Dderick@phpinternals.news. Thank you for listening, and I'll see you next week.


PHP Internals News: Episode 53: Constructor Property Promotion

PHP Internals News: Episode 53: Constructor Property Promotion

In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the Constructor Property Promotion RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 53. Today I'm talking with Nikita Popov about a few RFCs that he's made in the last few weeks. Let's start with the constructor property promotion RFC.

Nikita Popov 0:36

Hello Nikita, would you please introduce yourself? Hi, Derick. I am Nikita and I am doing PHP internals work at JetBrains and the constructor promotion, constructor property promotion RFC is the result of some discussion about how we can improve object ergonomics in PHP.

Derick Rethans 0:56

Object economics. It's something that I spoke with Larry Garfield about two episodes ago, where we discuss Larry's proposal or overview of what can be improved with object ergonomics in PHP. And I think we mentioned that you just landed this RFC that we're now talking about. What is the part of the object ergonomics proposal that this RFC is trying to solve?

Nikita Popov 1:20

I mean, the basic problem we have right now is that it's a bit more inconvenient than it really should be to use simple value objects in PHP. And there is two sides to that problem. One is on the side of writing the class declaration, and the other part is on the side of instantiating the object. This RFC tries to make the class declaration simpler, and shorter, and less redundant.

Derick Rethans 1:50

At the moment, how would a typical class instantiation constructor look like?

Nikita Popov 1:55

Right now, if we take simple examples from the RFC, we have a class Point, which has three properties, x, y, and Zed. And each of those has a float type. And that's really all the class is. Ideally, this is all we would have to write. But of course, to make this object actually usable, we also have to provide a constructor. And the constructor is going to repeat that. Yes, we want to accept three floating point numbers x, y, and Zed as parameters. And then in the body, we have to again repeat that, okay, each of those parameters needs to be assigned to a property. So we have to write this x equals x, this y equals y, this z equals z. I think for the Point class this is still not a particularly large burden. Because we have like only three properties. The names are nice and short. The types are really short. We don't have to write a lot of code, but if you have larger classes with more properties, with more constructor arguments, with larger and more descriptive names, and also larger and more descriptive type names, then this makes up for quite a bit of boilerplate code.

Derick Rethans 3:16

Because you're pretty much having the properties' names in there three times.

Nikita Popov 3:20

Four times even. One is the property name and the declaration, one in the parameter, and then you have to the assignment has to repeat it twice.

Derick Rethans 3:30

You're repeating the property names four times, and the types twice.

Nikita Popov 3:34

Right.

Derick Rethans 3:36

What is the syntax that you're proposing to improve this?

Nikita Popov 3:39

The syntax is to merge the constructor and the property declarations. So you only declare the constructor and you add an extra visibility keyword in front of the normal parameter name. So instead of accepting float x in the constructor, you accept public float x. And what this shorthand syntax does is to also generate the corresponding property. So you're declaring a property public float x. And to also implicitly perform this assignment in the constructor body. So to assign this x equals x, and this is really all it does. So it's just syntax sugar. It's a simple syntactic transformation that we're doing. But that reduces the amount of boilerplate code you have to write for value objects in particular, because for those commonly, you don't really need much more than your properties and the constructor.

Derick Rethans 4:40

Besides public, I suppose you can also use protected and private there as well.

Nikita Popov 4:45

That's right. So you can use all the visibility modifiers. Well, public protected private, static does not really make sense. But if we add other modifiers in the future, then those could be used there as well for example, if we add support for read only properties, then of course, you could also write public readonly float x or something.

Derick Rethans 5:09

The RFC talks about desugaring. How's this implemented? Is this transformation on in the AST, or in another way?

Nikita Popov 5:17

This is not an AST transform, but I would say close enough. So we just generate the corresponding property declarations and assignments in the compiler. If you inspect the AST with an extension like PHP AST, you will see the code as written. So with the public in front of the parameter name, but if you inspect the code in reflection, then it will look as if you declared the property explicitly.

Derick Rethans 5:48

So the RFC talks about a few constraints and what you can and cannot do with those promoted properties. One of the things it talks about is nullability.

Nikita Popov 5:58

Well, we have two different nullability semantics in PHP for historical reasons. One is in parameters, where we say, if you use a type that is not explicitly nullable, but you have a null default value, then we make the type implicitly nullable. While for property types, which are newer, we no longer have this implicit behaviour. So if you want to have a nullable property, you do need to explicitly mark it as nullable. Just using a null default value on will result in an error. And the handling is the same here. So if you want to have a nullable promoted property, you have to mark it as nullable

Derick Rethans 6:43

And you cannot just rely on setting the default to null?

Nikita Popov 6:46

Exactly, but I think it's like detail. And really this could go either way. I just prefer the explicit nullability because this seems like the direction we are going to in the future. I don't know if we will ever remove this implicit behaviour. Maybe not. But I think nowadays explicit one is preferred.

Derick Rethans 7:10

Less magic is better.

Nikita Popov 7:11

Less magic, exactly.

Derick Rethans 7:13

The RFC also has like constraints in there. You can also define a constructor in traits and abstract classes. Can you also use a constructor property promotion there as well?.

Nikita Popov 7:23

In traits? Yes, I mean in traits, using it will be a little bit weird. But there is no reason why it can't work. After all traits can have a constructor that will be used in the using class. And traits can also have properties that get imported. So the same mechanism works there as well. It does not work for abstract constructors or constructors in interfaces. The syntax also implies that you have some assignments inside the body of the constructor, and if we have an abstract constructor, then we could not emit these assignments anywhere. We could support it as a special case, like saying that it only declares the properties but skips those assignments. But I know how often you've used abstract constructors, I probably used them like maybe once or twice in all my time working with PHP. So either they really need extra support in that area.

Derick Rethans 8:25

It would also then introduce an inconsistency were promoted properties in abstract classes or abstract class constructors if that's the thing, would be different from normal class constructor property promotion. How does the inheritance work? Is the working in the same way or is there no specific difference in it?

Nikita Popov 8:44

Based on like discussion feedback, I think inheritance is the largest point of confusion with this syntax. The thing is that does not really have any special interaction with inheritance. So you can just follow this like syntactical transformation it does, which does not have any impact on inheritance. But the thing is, if you just look at the code, and you see you have the parent class defining the constructor, and the child class defining the constructor, and then you're wondering, well, is there some kind of connection between the parameters? The promoted parameters declared in one constructor and the other one? And the answer is simply: No, there isn't. Those have nothing to do with each other. And even more generally, constructors are a bit of a special case where inheritance is concerned. So usually, we say that methods always have to be compatible with the parent method. So the signature has to be compatible, the return type has to be well not match, but be contravariant. And similar for the argument types, but this rule does not apply for the constructor. So the constructor really belongs to a single class, and constructors between parent and child class do not have to be compatible in any way.

Derick Rethans 10:09

Are there any types that you can't use for constructor property promotion?

Nikita Popov 10:14

Just callable. Because callable is not a valid property type. Well, there is one more thing that you can't use a variadic argument. Well, if you write a variadic argument, you write something like int, dot, dot, dot, whatever. But the type you're actually writing is int, because that's the type of each individual argument. But all of that gets collected into an array. So the type of the corresponding property would have to be array. So we would have to do an extra transform that's maybe not super obvious. And so I've left this part out.

Derick Rethans 10:50

And also PHP's type system doesn't support defining an array of integers. It only supports describing an array. At a time we're talking about is, at the end of April, this hasn't gone up for a vote yet. When do you think this will happen?

Nikita Popov 11:05

The RFC will need one small adjustment because the attributes RFC is currently in voting and it very much looks like it's going to be accepted. We will need to also consider support for attributes on the promoted properties. I think the only small question there is, what does the attributes apply to? Because this could apply to the parameter or to the property, or both.

Derick Rethans 11:34

How would you actually set these attributes because from what I understand docblocks, you can only use in front of a method name or a property declaration. How would you define a different attribute for each of the promoted properties?

Nikita Popov 11:48

I believe that the attributes RFC already supports attributes on parameters, so that shouldn't be a problem.

Derick Rethans 11:55

So it allows for setting a specific attribute for each of the arguments coming into the constructor. But that didn't quite answer the question. When do you think we'll be voting on this?

Nikita Popov 12:05

Maybe in a week or so.

Derick Rethans 12:06

By the time this podcast comes out?

Nikita Popov 12:09

Well, we have had a lot of activity recently in PHP internals. So I guess we are one of the few places that benefit from the Coronavirus, because people now have time to work on PHP.

Derick Rethans 12:24

Yeah, I mean, I'm looking at so much extra code now. Interestingly, when going to the RFC, and as a side note, it mentioned somewhere that when defining more properties, the line length goes too long, because you now have this extra keyword in there. And that could benefit from then separating the constructor arguments over multiple lines. And that that raises the point is that you can use a trailing comma in arrays when you call functions, but not in argument lists. And I saw that you've also made another RFC for adding the trailing commas in the parameter lists.

Nikita Popov 12:58

So there's like a super simple RFC, just allow that extra comma. This has actually already been discussed a couple of times in the past, and has not, has been declined that point.

Derick Rethans 13:13

I'm just having a quick look at it. Because this RFC is already voting to see what the current votes are, and it's 58 for and one against.

Nikita Popov 13:21

I think like the main counter argument people have against this kind of trailing comma stuff is, well, doesn't that mean that it encourages writing methods with a lot of parameters, which is a bad style. I don't think it does. And I think that even if you don't have a lot of parameters, it's fairly easy to run into line length limitations, because nowadays like to use expressive long parameter names, and expressive long type names, so even without adding an extra protected in front of all of that, you can really easily get signatures that split across multiple lines. In which case having the trailing comma is nice, mainly because we already write it everywhere else.

Derick Rethans 14:12

Except for in arguments to methods, because you can't.

Nikita Popov 14:17

Well, there are also a couple of other places where you can't. For example, like if you have a class implements, and then implements many interfaces, then you can't put a trailing comma after the last interface. And this is something we could also allow. But I think the relevant distinction there is that this is kind of a freestanding list. Um, it's not wrapped inside brackets, or parentheses. So it kind of looks a little bit weird if you have a trailing comma there, which is possibly also why previous RFC on that simply allowed trailing comma everywhere did not pass.

Derick Rethans 14:58

As I said, it looks likely that will pass.

Nikita Popov 15:01

Yes, I think it's unlikely that we're going to get 13 new no votes.

Derick Rethans 15:07

What I also find interesting is that an RFC that you've mentioned earlier in the episode is that attributes are going to pass as well. At the moment, there's only one no votes there as well, which surprised me because the last time attributes was discussed was very much not going to pass whatsoever.

Nikita Popov 15:27

Yeah, this is an interesting effect. It's hard to say why it happens. Probably, well, part of the reason is just that issues that were raised on previous proposals have been addressed. For example, the last one by Dmitri had the very controversial aspects where it's exposed the AST. The abstract syntax tree representation of the attributes, which has gone from this one, and thus removes one of the contentious issues. But I think another part is just that sometimes it takes multiple proposals to really get an idea through internals. We have this situation pretty commonly that though the first RFC fails, second RFC fails, and then the third one does pass.

Derick Rethans 16:18

It's also it's taken five years or so. And people's opinions might just change about these things.

Nikita Popov 16:23

Exactly. The previous proposals might just have been before their time.

Derick Rethans 16:29

I saw you had made one other tiny RFC, which is the stricter type checks for arithmetic slash bitwise operators. What is that about?

Nikita Popov 16:40

Very simple. So if you're write, well, x minus y, and x is an array. And y is a resource, like what do you expect the outcome to be? There is really no reasonable way that can work. So this RFC proposes to make the arithmetic and the bitwise operators, when working on arrays, when working on objects, and working on resources, simply throw an exception. And the motivation for that was the operator overloading RFC, which has in the meantime been declined. But still, this was a concern raised there that while you can overload operators for objects, but you still get pretty weird behaviour if an overloaded operator is missing, because we currently handle that with just a otice and assuming that the object is equal to one, which is usually not a useful or desired behaviour.

Derick Rethans 17:39

There is of course, one exception where you can still use an arithmetic operator, which is the plus between arrays.

Nikita Popov 17:46

That's right, yeah. So array plus array is similar to an array merge operation. And that one is of course, well defined and remains supported

Derick Rethans 17:55

Whereas things like true divided by 17, although not sensible, it'll continue to work.

Nikita Popov 18:00

Right, that also. Yeah, so because this is simply a much more contentious issue whether, like implicitly treating true as one is a good idea or not. Personally, I know I have written code where I, for example, add up booleans. Just as a count of how often something is true. This is like maybe maybe, style wise it would be better to write an explicit integer cast. But the code is also not really wrong. This may be as a discussion for another time.

Derick Rethans 18:33

As we've said before, the smaller the RFCs, the easier it is to get them passed as well. Alright, Nikita, thanks for taking the time this morning to talk to me about constructor property promotion RFC, and a few others. We'll see whether they get passed for PHP eight.

Nikita Popov 18:48

Thanks for having me Derick, once again.

Derick Rethans 18:52

Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week.


PHP Internals News: Episode 52: Floats and Locales

PHP Internals News: Episode 52: Floats and Locales

In this episode of "PHP Internals News" I talk with George Banyard (Website, Twitter, GitHub, GitLab) about an RFC that he has proposed together with Máté Kocsis (Twitter, GitHub, LinkedIn) to make PHP's float to string logic no longer use locales.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 52. Today I'm talking with George Banyard about an RFC that he's made together with Mate Kocsis. This RFC is titled locale independent floats to string. Hello, George, would you please introduce yourself?

George Banyard 0:39

Hello, I'm George Peter Banyard. I'm a student at Imperial College and I work on PHP in my free time.

Derick Rethans 0:47

All right, so we're talking about local independent floats. What is the problem here?

George Banyard 0:52

Currently when you do a float to string conversion, so all casting or displaying a float, the conversion will depend on like the current local. So instead of always using like the decimal dot separator. For example, if you have like a German or the French locale enabled, it will use like a comma to separate like the decimals.

Derick Rethans 1:14

Okay, I can understand that that could be a bit confusing. What are these locales exactly?

George Banyard 1:20

So locales, which are more or less C locales, which PHP exposes to user land is a way how to change a bunch of rules on how string and like stuff gets displayed on the C level. One of the issues with it is that like it's global. For example, if you use like a thread safe API, if you use the thread safe PHP version, then set_locale() is not thread safe, so we'll just like impact other threads where you're using it.

Derick Rethans 1:50

So a locale is a set of rules to format specific things with floating point numbers being one of them in which situations does the locale influence the display a floating point numbers in every situation in PHP or only in some?

George Banyard 2:06

Yes, it only impacts like certain aspects, which is quite surprising. So a string cast will affect it the strval() function, vardump(), and debug_zval_dump() will all affect the decimal locator and also printf() with the percentage lowercase F, but that's expected because it's locale aware compared to the capital F modifier.

Derick Rethans 2:32

But it doesn't, for example, have the same problem in the serialised function or say var_export().

George Banyard 2:37

Yeah, and json_encode() also doesn't do that. PDO has special code which handles also this so that like all the PDO drivers get like a constant treat like float string, because that could like impact on the databases.

Derick Rethans 2:53

How is it a problem that with some locales enabled and then uses a comma instead of the decimal point. How can this cause bugs and PHP applications?

George Banyard 3:02

One trivial example is if you do, you take a float, you convert it, you cast it to string, and then you cast it back to float. If you're on a locale, which is the dot decimal separator, you will get back the original float. However, if you have like locale which com... which changes the decimal separator, like the German one, you'll get a string; you'll get like three dash, three comma 14, and then when you convert it back to float, you will only get three because PHP doesn't recognise the comma as a decimal separator in its string to float conversion and so it will loses the decimal information.

Derick Rethans 3:39

That doesn't seem particularly very useful as a feature. So my question here is we talked about floating point numbers and, and I think floating point numbers have other issues as well. Not sure whether we want to go into the details of how floating point numbers and computers work, but we can if you want to.

George Banyard 3:56

The easy way to explain floating points is to use like exponential notation, or to use the scientific exponential notation, which most people will know from engineering or physics, where you usually have like, one significant like the number, like a comma, a couple of numbers, and then you have like an exponent which raises it to usually, so to your power 10 to the something, which then gives you an order of magnitude. Floating points, basically that but in base two.

Derick Rethans 4:26

Positions have magnitudes attached to them. They're all powers of two.

George Banyard 4:30

Yeah.

Derick Rethans 4:31

And of course, when we use numbers an decimal, like pi being a bad example.

George Banyard 4:36

Once said.

Derick Rethans 4:37

I was going to say if you divide 10 by three, you get 3.33333 that never ends, right. And I reckon if you have a specific number in decimal like three point 14, then you can't necessarily always exactly represent it in binary.

George Banyard 4:55

Yeah, one common example would say it's like one 10th which has like a perfect representation in decimal. But like in binary is a never ending repeating sequence. When you try to like display naught point one, like how it's saved in floating point, it's really weird and everything to get like these rounding errors which can propagate.

Derick Rethans 5:15

And hence you often hear people recommend to never use float for things like monetary values, but then as you said that you sentence that right?

George Banyard 5:23

Yeah, put everything in integers and work with integers and just like format it afterwards.

Derick Rethans 5:29

So let's get back to what you and Mate are actually suggesting to change. What are the changes that you want to make through this RFC?

George Banyard 5:36

The change's more or less to always make the conversion from float to string the same, so locale independent, so it always uses the dot decimal separator, with the exception of printf() was like the F modifier, because that one is, as previously said, locale aware, and it's explicitly said so.

Derick Rethans 5:56

Doesn't printf also have other floating related format specifiers? I believe there's an E and a G as well. And uppercase F. What is the difference between these?

George Banyard 6:06

Lowercase F is just floating point printing with locale awareness. Capital F is the same as lowercase, but it's not locale aware. So it always uses the dot decimal separator. Lowercase E is, what I've learned recently also locale aware, and uses the exponential notation, like with a lowercase e. Uppercase E is the same as lowercase E, but instead of having a small like a lowercase e in the printing format, it's a uppercase E, and lowercase G has some complicated rules onto when it decides which format to choose between lowercase F and lowercase E, depending on like how big like the number of significant digits are after the comma, or like the dot. And uppercase G is the same but using uppercase F and uppercase E instead of lowercase E and lowercase F.

Derick Rethans 6:58

And all of them can be locale dependent then except for uppercase F.

George Banyard 7:02

Yeah.

Derick Rethans 7:02

Do you think this is going to impact people's applications, if you change the default of normal casts to be locale independent?

George Banyard 7:10

I would have expected it to not be that significant. And only that would affect displaying floating point. So if you're like in Germany, instead of like seeing a comma, you would now see a dot, which can be annoying, but I wouldn't imagine is the most, the biggest problem for you like end users. But apparently, people made tooling to work around the locale awareness of it. And so they could maybe break with passing stuff, which I suppose that happens because it's been, PHP's 25 years old. And that behaviour has been there for like ever. So people worked around it or work with it.

Derick Rethans 7:49

Is this going to be purely a displaying change or something else as well?

George Banyard 7:54

For example, if you would send like a float to like an API via HTTP, you would usually already need to have like code around to like work around like the locale awareness, or like all by resetting set locale or by using number_format or like sprintf or something like that. Because most other APIs or like you would like contact would expect like the float to use like a decimal point. PHP. If you do the string to float conversion again, which was not a point, then you get only an integer basically.

Derick Rethans 8:27

Because PHP's parser, strips it out once it stops recognising digits, which is in this case, the comma.

George Banyard 8:33

Yeah, that would make the code nicer. The main reason why me and Mate like decided to propose this RFC is because like most APIs, and also databases and everything, expect strings to be formatted in like a standard way. Currently, like if you for whatever reason, use a locale, then it's not, but yeah, like apparently people worked around that when they were maybe stripping stuff from like HTML whatever displayed and try to work around it because that got raised in the list quite recently.

Derick Rethans 9:06

This change does not necessarily remove the ability of using locales for formatting numbers, because PHP still has the lowercase F as format specifier for printf. And sprintf and friends. Does PHP have other ways of rendering numbers according to locales?

George Banyard 9:24

According to locales? I don't think so. You can format it something like manually, or the number format a class from the Intl extension.

Derick Rethans 9:35

Yeah, from what I understand, number_format, you have to do it all by yourself. And the intl extension doesn't support the posix or C locales from the operating system, right. It uses its own locale rule set from the Unicode project. The RFC lists some alternative approaches. Would you mind touching a little bit on these as well?

George Banyard 9:58

One of the alternatives approaches is to deprecate setlocale altogether. Because as a byproduct, this just fixes the issue because you can't define any locale anymore. So, there will always be locale independent. This has been discussed like in back in 2016, mostly because of the non thread safe behaviour. Because it affects global states and everything. But at the time, the conclusion was, because HHVM, like did a patch, making a thread safe, setlocale function was to mimic this patch and like implement it into PHP, which hasn't been done yet. Another one that we thought about was to deprecate kind of the behaviour and like raise a notice, like a deprecation notice, because that would happen like basically on every float to string conversion. The penalty, like the performance penalty, seemed pretty like strong. One other thing we considered was with Mate was to deprecate the current behaviour in some way. However, emitting a deprecation notice on basically every float to string conversion seemed not to be ideal. And just like flood, the log, the log output, and like also bring like a performance penalty because like outputting warnings isn't like most friendly thing to do performance wise.

Derick Rethans 11:21

What has the feedback been so far?

George Banyard 11:24

Feedback currently has been that like most people, well, one person because there hasn't been that much feedback.

Derick Rethans 11:30

There hasn't been that much feedback because you've only just proposed?

George Banyard 11:33

some of the feedback we got officiates the change However, they have concerns about like the modification of like, in every case for locales without having any upgrade paths. In some sense. It's just, oh, you have the change, and then you need to execute it and see what breaks. We may be currently considering like ways to figure that out, maybe by adding a temporary ini setting which would kind of be like a debug mode, where when you use that it would like emit notices when like this conversion would happen before and they would notice: Oh, this is not happening anymore. You need to like be aware of this change in behaviour

Derick Rethans 12:17

Did we not used to have E_STRICT for this at some point or E_DEPRECATED?

George Banyard 12:24

E_DEPRECATED is still a thing. E_STRICT got mostly removed with PHP seven. There've been like a couple of remaining notices which I got rid off or put back to normal E_WARNINGS or E_NOTICES in PHP seven point four. There were like two or three remaining. But yeah, like so that's one way to maybe approach it of like implementing a debug ini setting which would only be used for like dev because then where if you get like warnings and everything, you don't really care about the performance impact. And then in production, you would like disable that and the warnings wouldn't pop up.

Derick Rethans 12:56

How would that setting be any different from just putting it behind an E_DEPRECATED warning?

George Banyard 13:00

So with an E_DEPRECATED warning, we would need to show this behaviour, and we would need, and we could only change the behaviour in like PHP nine. Currently if we do that with like debug setting, we could change it with PHP 8.

Derick Rethans 13:13

That's a bit cheating isn't that?

George Banyard 13:15

Could say so.

Derick Rethans 13:16

I'm interested to see how this ends up going. Do you have any timeframe of when you want to put it for a vote?

George Banyard 13:23

Currently, we've only started this discussion. And I think until we figure it out, if we get like an upgrade pass, or multiple upgrade passes that we could then put into a secondary vote. I wouldn't expect it to go to voting that soon. Maybe end of April would be nice.

Derick Rethans 13:41

So around the time when this podcast comes out?

George Banyard 13:44

Ah! For once!

Derick Rethans 13:46

For once I got my timing right.

George Banyard 13:49

Yes. Don't you have like the string contain one which just got out.

Derick Rethans 13:53

Yes.

George Banyard 13:54

Then that vote close like last week.

Derick Rethans 13:57

Yeah, it's really tricky because there's so many, so many small now that I can't keep up.

George Banyard 14:02

Yeah, Mark also did like his debug.

Derick Rethans 14:04

Yeah. And there's like two or three tiny ones more that I would quite like to talk about. But by the time there's an opening in the schedule, it's pretty much irrelevant. So I'm trying to see whether I can wrap a few of the smaller ones just in one episode because there's the throw expression, the is literal check, and typecasting in array destructuring expressions, and all showed up in the last three days.

George Banyard 14:26

I suppose people have like, lots of time now. Now, it's a taint checker, basically, like I know, there's been like this paper by Facebook like six or eight years ago, which talks about how they kind of tried to implement in their static analyzer, but like, a static analyzer doesn't need to be something in the engine. That's what I don't really get.

Derick Rethans 14:45

Thank you, George, for taking the time this afternoon to talk to me about a locale independent float to string RFC.

George Banyard 14:53

Thanks for having me on the podcast again. Derick.

Derick Rethans 14:55

You're most welcome. Thanks for listening to this installment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week.


PHP Internals News: Episode 51: Object Ergonomics

PHP Internals News: Episode 51: Object Ergonomics

In this episode of "PHP Internals News" I talk with Larry Garfield (Twitter, Website, GitHub) about a blog post that he was written related to PHP's Object Ergonomics.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 51. Today I'm talking with Larry Garfield, not about an RFC for once, but about a blog post that he's written called Object Ergonomics. Larry, would you please introduce yourself?

Larry Garfield 0:38

Hello World. My name is Larry Garfield, also Crell, CRELL, on various social medias. I work at platform.sh in developer relations. We're a continuous deployment cloud hosting company. I've been writing PHP for 21 years and been a active gadfly and nudge for at least 15 of those.

Derick Rethans 1:01

In the last couple of months, we have seen quite a lot of smaller RFCs about all kinds of little features here and there, to do with making the object oriented model of PHP a little bit better. I reckon this is also the nudge behind you writing a slightly longer blog post titled "Improving PHP object ergonomics".

Larry Garfield 1:26

If by slightly longer you mean 14 pages? Yes.

Derick Rethans 1:29

Yes, exactly. Yeah, it took me a while to read through. What made you write this document?

Larry Garfield 1:34

As you said, there's been a lot of discussion around improving PHP's general user experience of working with objects in PHP. Where there's definitely room for improvement, no question. And I found a lot of these to be useful in their own right, but also very narrow and narrow in ways that solve the immediate problem but could get in the way of solving larger problems later on down the line. So I went into this with an attitude of: Okay, we can kind of piecemeal and attack certain parts of the problem space. Or we can take a step back and look at the big picture and say: Alright, here's all the pain points we have. What can we do that would solve not just this one pain point. But let us solve multiple pain points with a single change? Or these two changes together solve this other pain point as well. Or, you know, how can we do this in a way that is not going to interfere with later development that we've talked about. We know we want to do, but isn't been done yet. So how do we not paint ourselves into a corner by thinking too narrow?

Derick Rethans 2:41

It's a curious thing, because a more narrow RFC is likely easier to get accepted, because it doesn't pull in a whole set of other problems as well. But of course, as you say, if the whole idea hasn't been thought through, then some of these things might not actually end up being beneficial. Because it can be combined with some other things to directly address the problems that we're trying to solve, right?

Larry Garfield 3:07

Yeah, it comes down to what are the smallest changes we can make that taken together have the largest impact. That kind of broad picture thinking is something that is hard to do in PHP, just given the way it's structured. So I took a stab at that.

Derick Rethans 3:21

What are the main problems that we should address?

Larry Garfield 3:24

So the ones that identify that people have been talking about are the following. One is constructors are just way too verbose. If you've looked at almost any PHP class, in almost any framework, the most common pattern is: you start with a class, you declare three to five properties that are private or protected. Then you have a constructor that takes three to five parameters and assigns each of those to those properties. Usually the names match all the way through, types match all the way through. It's all it's doing is shoving those parameters into properties. Right now, you have to repeat each property name four times total. It's just way too verbose. It's just more typing than we should be doing. And so there have been various proposals for ways to have to type less to do that.

Derick Rethans 4:11

We'll get to the solutions in a moment, I'm sure.

Larry Garfield 4:14

The next one is what I've called the bean problem. So I've referenced to Java beans. For those who have not worked with Java before. And I haven't worked with it in a long time. But when I last did, this was standard, you'd have what's called a Java bean, which is just a Java class that has a bunch of properties that are private, and then a getter and a setter for every single one of those properties. PHP, you see the same pattern a lot, especially in ORMs. Largely that comes down to this makes serialisation and deserialization straightforward because you can access properties through a method, you know, the names, automatic naming and so on. But that's again, an awful lot of typing to bypass the private and protected keyword. So how can we reduce the mental overhead of that and just have access to what we need to with less work. That relates to a lot of the reasons for that is immutable objects. So it's been increasingly popular in PHP in recent years to have objects that even though the language doesn't support immutability are effectively immutable, in that the object doesn't give you a way to change its properties. But it gives you a way to create a new object that is the same, but with certain changes. Think DateTimeImmutable in PHP core, or it has a modify() method, which doesn't change the objects in place. You see, if you call a DateTimeImmutable object, call it with the modify() method with a parameter of plus one week you get back a new DateTimeImmutable object, that is the timestamp one week later. That pattern is increasingly common. PSR-7, the HTTP messages spec uses that a lot of other packages have started doing it. The way that usually ends up working is these wither methods. It's with some value, with some some property name and so on, similar to a setter, but it returns a new object and there's a common pattern for that now. Another problem is materialised values, where you have something that conceptually is a property. And to a outside caller, it really should just be a property. But you want to not have it be a full property itself. The example I use the kind of the canonical example is you have a first name property and a last name property and you want to format a full name property. There's a lot of cases like that. Right now, you do that as a method, and you have some kind of static cache internally. Which works. It's just: Can we make that better? And can we not make it worse with any of these other changes? A lot of this comes down to how do we make not make any of these problems worse. Another problem is, for lack of better term, and what I call the documented property problem, where if you have a large constructor, then you're going to pass in a bunch of different values because they all map to properties, but you need to keep track of: Okay, which one of these is which? And especially comes up for value options, rather than service objects. Were introduced in C, or Rust or Go would just be a bare struct, essentially, which PHP doesn't have. And we can get to why I think that's okay, we don't have. But objects where you really just have a combination of properties, and that's okay. But you still need to keep track of them, you want to be able to create an object that has only some of them. And if you have eight optional properties, and you want to just set the last one, right, now you have a bunch of nulls or question marks, or empty quotes, or zeros, or whatever default value, and again, it's just very cumbersome. And so the kind of the question I was looking at is, how can we make all of these better and not make any of them worse? That's kind of the problem space. I think most people can relate to, at least most of these.

Derick Rethans 7:46

I would think so to certainly in some of my code, where that's been the case. Hopefully, that was all the problems you found.

Larry Garfield 7:53

I think I got all of them.

Derick Rethans 7:55

As I alluded to, in the introduction, there have been quite a few smaller RFCs already to address some of the problems that you just mentioned. Which you list and as well as others in things that you have found that multiple people currently already do. Should we have a quick look at what these things are?

Larry Garfield 8:15

One of the proposals that I looked at was writeonce properties, as we are recording this, there's an RFC for that that's in voting. Although it looks like it's probably not going to pass that the vote stays where it is. Now, the idea there is allow typed properties to have a read only marker on them just like the type or public or private, and then they can only be written to once if they're uninitialised you can write to them, after that they're just stuck that way. The advantage is that would make them safe to expose publicly. And so you can have a property that you can expose to the world just access a property but not be concerned about someone changing it out from under you. The downside of that mainly comes down to that evolvable immutable object where that with method then becomes a lot harder, because you can't say: clone this object and change this one property because well, you can't change this one property, you'd have to fully construct a new object. There's also two different proposals that have been floated recently for compact object property assignments. I think they have different names for the same basic idea. Basically, if an object has public properties, being able to write to those in one shot in a code block, along with the constructor in a named fashion. It's essentially there's a common pattern now where you pass an associative array to a function which has a bunch of named properties, and then you can put them in whatever order you want. And then you know, dissect those and map those to properties internally. It's essentially taking that idea and baking it into the syntax, which does help and gives you when you have a lot of properties that are optional. It makes it a lot easier to you have a lot of properties defined or a lot of parameters defined it makes it a lot easier to piecemeal select them. The downside is all of those proposals to date only work on public properties, which have a long list of challenges with them. It also means you're bypassing any kind of validation around this property is only valid if this property is set, or this property has to be less than this property, and so on. Those are too limiting, but definitely they're trying to solve a real pain point.

Derick Rethans 10:19

Nor can you enforce types through that, of course.

Larry Garfield 10:21

Some of them I think, might be able to

Derick Rethans 10:23

I meant associative arrays.

Larry Garfield 10:25

Yeah, the associative array approach you can do now, which is really the only possible thing I can say in its favour is that it works today. Type enforcement isn't there, it's poor for documentation. Please don't do that. All these are dancing around names parameters, which is a different language feature that's been discussed on and off for many, many years. I don't know of any current RFCs on the table for this one, but it's come up many times. Number of languages have this Python has it for example, where give or take whatever syntax instead of specifying, call this function with parameters, one, seven and 19, and then you have to guess what those numbers mean, you can call a function with count equals one, order equals ASC, whatever. And then you can reverse the order, change the order around. It's essentially the same idea. But for function parameters rather than Object Properties. Again, there's implementation challenges there. But certainly there are languages that do it successfully. Another problem space people have been looking at is access control. So we mentioned the the read only property. In the discussion for that Nicholas Grekas, made a suggestion for having instead of having a read only flag, allow the access control on a property to be different for read and write. So you could have a property that is publicly readable but not writable. But private writable, or private and protected writable. That gives you many the same benefits as the read only flag would have, but without breaking some of the current patterns we have around cheap cloning of objects and so forth.

Derick Rethans 11:58

Because of course in PHP, PHP's object oriented system is based on classes, not on objects. You can access read and write private properties of other objects as long as they have the same class.

Larry Garfield 12:10

Correct. And that's something that we take advantage a lot of in cloning, to hold wither method style is based on that. If that feature of PHP went away, it would break an awful lot of code. So don't change that. Other things have been on the table. People have talked in the past about constructor promotion, which is a feature that a couple of languages have including Hack, which is the Facebook PHP fork. The basic idea there is, instead of repeating properties once for their declaration, once in the constructor, and then twice in an assignment, you just declare them as part of the constructor. And it becomes essentially a macro to expand that out to the same original code. Hack already has a syntax for that. This one actually has been a proposal for PHP before and it didn't pass.

Derick Rethans 12:57

Was it proposed in the exact same syntax as Hack? I don't believe so because Hack had types at the moment, and PHP did not.

Larry Garfield 13:05

The earlier syntax, I was just looking at that RFC earlier today, used public function constructs this arrow foo, comma, this arrow bar. And then you still had to declare the properties independently, so it only solves half the problem. And the syntax looked kind of weird. The Hack syntax just lets you put the entire property declaration in place of the parameter in the constructor line, and it fills in all of the other pieces. You have public function, construct, parentheses, private int, a number, private bar, some bar object, and so on. And it would automatically create that property on the class and take the parameter and promote it and do the assignment for you. So that's what Hack does. I believe TypeScript has something similar, although I haven't worked with it. It's again just simplifying that common case. Another non PHP place I look for inspiration is Rust, because Rust does immutable objects very well. And so I figured, alright, let's let's look what other languages are doing. What Rust does, they have objects that are more bare than PHP does, much like Go where it's really a struct to which you can attach methods rather than an enclosed object, but they let you create a new object. Here, the object constructor syntax is essentially named parameters already, you're essentially providing a Json like block of this property of this value, this property should have this value, similar to the object constructor proposals. But you can then say, dot dot some other object of the same type, which Rust reads as: and fill in anything I haven't specified with the values from this other object. The fallout of that is making new object that is the same as this other object, but for this one change really easy. Could we do something like that either using Rust syntax or something else just conceptually, would that work to make with the with style methods easier, possibly would it help bypass the problems with a read only flag and so on. Finally, kind of the granddaddy of them all proposal in PHP from a couple of years ago is property accessor methods. This is a very contentious RFC, it didn't pass mostly for performance reasons, as I understand it. But the idea here was you could declare a property to have a dedicated getter and setter method. And then when you try to read or write a property, that method gets called transparently in the background. It's essentially the same idea as the magic get and magic set methods on objects, but specifically for each property, which can then eliminate a lot of: if we're talking about this property, if we're talking about that property gives you a lot more flexibility. It also allows you to then, because those are methods, control the access of those methods separately for get and set. So you can have a public getter and private setter method. A number of other languages have this, Python does, JavaScript does. So I included that okay, this has been a proposal on the table before, I personally really like it. The only downside is the performance impact because since people can't really know in advance if a property it's going to be accessing is guarded by methods like this or not, it means every property access, therefore has an extra if statement around it in the engine. And the performance impact of that, well, small, individually, really adds up when you're talking about 10s of thousands of property accesses. As I understand that, that was the main reason that it didn't pass before. I don't have a good solution for the performance issue. Unfortunately, it would be delightful if you know the typing system would let us do that. Or if the JIT would do something there. I have no idea that's well out of my wheelhouse.

Derick Rethans 16:34

That's lots of solutions that people have come up with in the past and haven't made RFCs for yet. Solving them all one by one, as you mentioned isn't particularly useful thing to do. Because, as you say, you end up in a jumbled mess of things. Your article continues to have an analysis section about all the different aspects of all the different problems and solutions that we've just mentioned here. What's your thinking here, how to join up all the dots?

Larry Garfield 17:00

My goal was alright, as I said, what's the minimum amount of change we can do, that gets us the maximum benefit and solve as many problems as possible without making anything worse? Is there a way that we can make some problems not their own problem, but the result of some other problem? Can we make one a degenerate case of another and thereby solve, kill multiple birds with one stone essentially? What I came up with was: one, constructor promotion on its own, I think is very useful. Let's do that. Named parameters on their own are very useful, let's do that. The combination of constructor promotion and named parameters together gives us the equivalent of a object initialization syntax. The specific symbology in the syntax may look slightly different. But essentially you get the same net effect where you could say, hey, new product object and pass it a series of key values and you're done. And the object itself is defined as just a bunch of key values in the construct statements, and no body, and that still gets promoted. So we end up with struct like, or record like objects with relatively little syntax as kind of a side effect of these two other changes that have good arguments for them on their own.

Derick Rethans 18:14

And also without introduce a new concept such as struct.

Larry Garfield 18:18

Exactly. There's also discussion about, should we just introduce a separate language construct for a struct or a record, that is just their properties, possibly some validation, they will pass by value instead of by reference, which makes immutability easier, to design those for immutability. I've toyed with that idea in the past. And every time I come down to eventually I'm going to want to do everything that classes do anyway. Or if they do something special, I'm going to want to do those in classes, except for the way they pass. Legitimately, there's cases where we would want to have a value object that passes in a more by value style instead of the pseudo reference that objects passed today. There are use cases for that, that's really the only difference. Everything else is essentially the same in both cases, it's more work than is needed to try and create a whole separate construct there. Instead, let's make this one construct flexible enough that we can use it in either way, at whatever use case makes sense. I think those two changes together give us the most bang for the buck and don't harm anything else.

Derick Rethans 19:16

Both of these two proposals help to solve the first problem that you have outlined, which is the problem with constructing objects. So the other problem that we spoke about is the value object and access to properties for example. Have you come up with a solution of which proposals would work towards solving that problem as well?

Larry Garfield 19:36

My proposal on that front, based on what's available, is so I like Nicholas's idea of separate access control for read and write. Okay, now what syntax can we use for that that is going to be self explanatory and readable and not block property accessors if we ever get to the point of figuring out how to do those performently. I don't think we can go all the way to property accessors right now, I would love to, but I don't think that's feasible. Instead, we can borrow some of the syntax from that proposal and let you declare hard to explain this in verbal format. It's like: string name, curly brace, public get, private set, curly brace. Which is essentially the syntax that the property accessor proposal RFC had, but with the method bodies removed, which that RFC actually supported anyway. And what that gives us is then a syntax to say, this property has different visibility for reading and writing, for get and for set, in a way where it's natural to be able to add in functionality to that later for getters and setter methods. If we figure out how to do it. There are probably other syntaxes that could do the same. I'm flexible. I think the key here is some sort of syntax that gives us that split visibility in a way that opens itself to future extension, rather than just throwing more keywords before a property and hoping it works out for the best. And once you've done that, then I think it's worth it to consider: could we do some kind of Rust like cloning or Rust like creation process? I don't know. It could be a variant on cloning. People have proposed a clone this with and then list of properties. And that, essentially de-sugars into creating that new object and then calling a bunch of property set commands. Maybe that's viable. Maybe it's not I'm not sure. Maybe using a syntax closer to what Rust has so that certain thing parameter lists can get auto populated, I don't know. But I think that's an area worth exploring, and would be a nice add on to these others, but it's not a prerequisite. The thing I like about what I'm proposing here, each of these individual pieces carries value on its own. And there's a good reason to vote for each of these on their own, but they dovetail together so that the whole is greater than the sum of the parts. And I think that's the mark of good design where you don't solve each individual problem. You have tools that together solve several problems. It just kind of falls out of the design.

Derick Rethans 22:06

Of course, at the moment you wrote this blog post, none of these proposals had more to it than your description in your article.

Larry Garfield 22:15

Some of them had old RFCs that had been proposed and either didn't make it to a vote or the vote gone slightly negative for various reasons. But yeah, I did not have any patches. My C skill is still extraordinarily limited. That this was a discussion starter, not a here's an RFC with code.

Derick Rethans 22:32

Of course, we are no day and a half or two days later. And now there is of course, an RFC for one of them, which is the constructor promotion, which pretty much as we spoke about earlier, picks up Hacklang's syntax and ports it to PHP.

Larry Garfield 22:47

Yes, I've concluded that my primary role in PHP internals is inspiring Nikita to go write things.

Derick Rethans 22:53

And you were successful in this case.

Larry Garfield 22:56

A year ago, I was on this podcast with you talking about comprehensions, when I was pushing for those, and those never happened. But out of that discussion, Nikita noticed, oh yeah, short lambdas I should go finish those and then went and finished that RFC. My role is convincing Nikita, he should do things. So I consider that a worthwhile contribution.

Derick Rethans 23:13

Fair enough. I agree. Anyhow, it would be interesting to see where this ends up going. We are about, what three, three months away from PHP 8.0's feature freeze. So there's plenty of time to look at these other three proposals that you concluded would be great to have altogether.

Larry Garfield 23:32

I'm happy to work with anyone who actually does know, working on internals on any of these. Personally, I think the asymmetric visibility is the next one after constructor promotion. That's straightforward to do. I know Levi Morrison on the lists has suggested that named parameters has a lot of other gotchas around it that I didn't get into here. And that is very likely. There may very well be implementation reasons why these are harder than I present them as. I fully acknowledge that. But again, if any of these individually, I think still moves the language forward in a way that doesn't close off future avenues.

Derick Rethans 24:07

Do you think you'll end up learning some C to be able to work on this yourself?

Larry Garfield 24:11

So I used to work in C briefly, 16 years ago. I had a very, very short career writing software for Palm OS.

Derick Rethans 24:18

And I remember us talking about it, when we recorded episode last year.

Larry Garfield 24:22

And I did some C again, just recently, while playing with FFI. As we've discussed before, the PHP engine is not written in C, it's written in a macro language that is written in C. There's a learning curve there that I have yet to scale.

Derick Rethans 24:34

Fair enough.

Larry Garfield 24:35

If someone wants to mentor me in that while we work on one of these, I am very open to that. So putting that out there.

Derick Rethans 24:40

You might be inundated by messages now, you never know.

Larry Garfield 24:43

Better that then getting ignored

Derick Rethans 24:45

Do you have anything else to at?

Larry Garfield 24:46

I think it's beneficial for PHP collectively to take this broader approach of, not just okay, what can solve this immediate problem in front of us, we can scratch this one itch, but what are all the itches that we have that need to get scratched? And how can we solve all of those in a way that is going to have the best bang for the buck. And let us do the least amount of work at the least amount of syntax, least amount of conceptual overhead, and yet give us the most flexibility. And there's been a lot of talk anytime we're talking about the PHP type system of we eventually want generics, generics are hard. But let's make sure that whatever we do, doesn't make generics even harder. I think that's good that we have this goal in mind. And we're: all right, what iterative steps get us closer to that without locking us, in without painting us into a corner. And that's kind of what I'm trying to do here. And I would very much encourage everyone working on PHP to take that approach of: don't solve the immediate problem, look at the broader picture, what will solve multiple problems, what will dovetail nicely with something else and what kind of big picture plan in architecture we can look at that ends up making the language better rather than just looking at our feet.

Derick Rethans 25:57

Well, thanks for taking the time this afternoon to come and talk about the object ergonomics. We'll see how much of it ends up in PHP eight.

Larry Garfield 26:05

Fingers crossed.

Derick Rethans 26:07

Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next week.


PHP Internals News: Episode 50: The RFC Process

PHP Internals News: Episode 50: The RFC Process

In this episode of "PHP Internals News", Henrik Gemal (LinkedIn, Website) asks me about how PHP's RFC process works, and I try to answer all of his questions.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 50. Today I'm talking with Henrik come out after he reached out with a question. You might know that at the end of every podcast, I ask: if you have any questions, feel free to email me. And Henrik was the first person to actually do so within a year and a half's time. For the fun, I'm thinking that instead of I'm asking the questions, I'm letting Henrik ask the questions today, because he suggested that we should do a podcast about how the RFC process actually works. Henrik, would you please introduce yourself?

Henrik Gemal 0:52

Yeah, my name is Henrik Gemal. I live in Denmark. The CTO of dinner booking which does reservation systems for restaurants. I've been doing a PHP development for more than 10 years. But I'm not coding so much now. Now I'm managing a big team of PHP developers. And I also been involved in the the open source development of Mozilla Firefox.

Derick Rethans 1:19

So usually I prepare the questions, but in this case, Henrik has prepared the questions. So I'll hand over to him to get started with them. And I'll try to do my best to answer the questions.

Henrik Gemal 1:27

I heard a lot about these RFCs. And I was interested in the process of it. So I'm just starting right off here, who can actually do an RFC? Is it anybody on the internet?

Derick Rethans 1:38

Yeah, pretty much. In order to be able to do an RFC, what you would need is you need to have an idea. And then you need access to our wiki system to be able to actually start writing that, well not to write them, to publish it. The RFC process is open for everybody. In the last year and a half or so, some of the podcasts that I've done have been with people that have been contributing to PHP for a long time. But in other cases, it's people like yourself that have an idea, come up, work together with somebody to work on a patch, and then create an RFC out of that. And that's then goes through the whole process. And sometimes they get accepted, and sometimes they don't.

Henrik Gemal 2:16

How technical are the RFCs? Is it like coding? Or is it more like the idea in general?

Derick Rethans 2:23

The idea needs to be there, it needs to be thought out. It needs to have a good reason for why we want to add or change something in PHP. The motivation is almost as important as what the change or addition actually is about. Now, that doesn't always get us here at variable. In my opinion, but that is an important thing. Now with the idea we need to talk about what changes it has on the rest of the ecosystem, whether they are backward compatible breaks in there, how it effects extensions, or sometimes how it effects OPCache. Sometimes considerations have to be taken for that because it's, it's something quite important in the PHP ecosystem. And it is recommended that it comes with a patch, because it's often a lot easier to talk about an implementation than to talk about the idea. But that is not a necessity. There have been quite some RFCs where the idea was there. But it wasn't a patch right away yet. It is less likely that these RFCs will get accepted, because in order to get something into PHP not only needs to be there a good idea, that also needs to be there a good implementation of it. If you have been a long term contributor to PHP, then you should know how to write a patch yourself. In other cases, you'll see people that have an idea try to find somebody else to do and work on the implementation together. But all RFCs, if they get accepted. It's always pending a good implementation.

Henrik Gemal 3:52

How is an RFC actually done? Is that like a template you fill out or is it like a website or how does it work?

Derick Rethans 3:59

Our Wiki, I will add a link to that in the show notes, has a template of how to create an RFC. It has a set set of sections. There's always an introduction that basically lays out what it is about or why this change is being made. Then there is often a proposal of what the change actually is. And then there's a few sections that are sometimes empty or sometimes are filled in such as, at least backwards incompatible changes, for which PHP version is been targeted, what the impact is to all the parts of the PHP ecosystem. But these things are not always necessary, because they don't always make sense to do right? If you want to add a new syntax to PHP, then that almost never influences existing extensions, but it will influence OPCache, for example. And then there's also often things like open issues, things we haven't quite thought through yet. A bit of a discussion, discussion bits will get filled in after people in the PHP internals list, which I'm sure we'll get to in a moment, come up with better ideas or alternatives sometimes, and then things like future scope will also be part of the template. We don't really require a very rigid approach to this, but we do appreciate if all the sections are filled in, or at least thought about in such a way that there's either information or not information. And then at the end, there's often a proposed voting choice. Everything at the moment needs to pass by two thirds majority before it gets accepted. So yeah, those are the things in the template itself. But the template is important. And you do need to fill it in, if you want to propose an RFC.

Henrik Gemal 5:33

Are all RFCs public or do you have like private RFCs?

Derick Rethans 5:38

All RFCs have to be public, otherwise they can't be voted on. But some RFCs start out of just a conversation with a few developers coming up with an idea. In the last few months, some more complicated RFC start out on a GIT repository. As a pull request, they never get merged anywhere. Because on GitHub, it makes it much easier to comment on specific sections for adopting feedback. Instead of having large discussions on the PHP internals mailing list, where sometimes comments might just get lost because there's too much text in there. Even though these RFC start out, while they're still sort of public, but nobody knows about them. In the end, they will always have to be public otherwise there won't be any voting, done on it, and it won't get accepted.

Henrik Gemal 6:27

Where's the RFC sent to and who's kind of in charge of the RFC? Is the one that makes the RFC or is it like a RFC commander?

Derick Rethans 6:37

The person that makes the RFC is responsible for guiding it through the whole process that we have. Once they are finished, there is a requirement for you emailing the PHP internals list with a specific prefix, which I think is RFC in square brackets. And then that starts a minimum discussion periods of two weeks. That discussion period might end up longer, in cases, lots of things to talk about or discuss or lots of disagreements, but the discussion period has to be a minimum of two weeks on the PHP internals mailing list.

Henrik Gemal 7:09

I was wondering a little bit about the priority RFCs because I see RFCs as like, a little bit like feature requests. So wondering who actually decides on the priority of an RFC?

Derick Rethans 7:23

Nobody really decides on the priority. Multiple RFCs can go through the process at the same time, you don't really have a priority of which one is more important than others. So yeah, there's nothing really there for it.

Henrik Gemal 7:35

I was just wondering if it's done like a normal project, you know, there might be many RFCs at the same time. I'm wondering how many kind of RFCs are there at the moment, are we talking 10 or are talking thousands?

Derick Rethans 7:50

This depends a bit on where in PHP's release cycle we are. PHP should get released at the end of November or the start of December. In all PHP seven releases that actually has happens. Usually the period between December and March, there will be like maybe one or two a week, which is great because that makes it possible for me to pick the right one to make an episode out for the podcast. At the moment, there are 10 outstanding RFCs. That means there are so many that I don't actually have enough time to talk about all of these on the podcast. However, they are often more just before we go to feature freeze, which happens at the end of June. So there's still two months to go. But you also see that over the last two years, there's a lot more smaller RFCs than there are big RFCs. So big RFCs like union types. They tend to be early in a release cycle, where smaller RFCs, as an example here, there's currently an RFC that there is no episode about, that suggests to do a stricter type checks for arithmetic or bitwise operators. Those are tiny, tiny changes. And in the last two years, there have been more and more smaller RFC than bigger RFCs because they tend to limit the amount of contention that people can disagree with and hence, often makes it easier to then get accepted. That is a change that I've been seeing over the years. But no, there are no thousands for each PHP version, I would say on average, there's about one a week, so about 50.

Henrik Gemal 9:19

I want to get a little bit into the voting part, because that sounds kind of interesting, who can actually vote?

Derick Rethans 9:28

After the two week minimum discussion period is over on the PHP internals mailing list, an RFC author can decide to put up the RFC for a vote. And that also requires you then to send an email to the PHP internals mailing list prefixing your subject with the word vote in capital letters. Now at this moment, you unfortunately see that people start paying attention to the RFC. Instead of doing that during the discussion period. At a moment of vote gets called you shouldn't really change RFC unless it's for like typos or like minor clarifications to things, you can't really change syntax in it for example. People can vote our people with a PHP commit access. And that includes internals developers, documentation contributors, and people that do things in the infrastructure. Everybody that has a PHP VCS account and VCS, version control system, that used to be CVS and now then SVN, and now GIT, as well as people that have proposed RFCs. So the group that technically could vote is over 1000 big, but the amount of people that vote is very much under 50 most of the time. We don't really have any criteria beyond you have to have an account to be able to vote in PHP RFCs.

Henrik Gemal 10:43

How is the voting actual done?

Derick Rethans 10:47

Since about last year, each RFC needs to be accepted, with a two thirds majority. On each RFC on the wiki, once a vote gets called you as an RC alter needs to include a small code snippets that then creates a poll. Very often do we want this? Or do we not want this? So it's a yes or no question. But sometimes there are optional votes, whether we want to do it a specific way, or another specific way. Sometimes that allows you to then select between different syntaxes. I don't think that is necessarily a good idea to have. I think the RFC author should be opinionated enough about picking a specific syntax. It is probably better to have a secondary vote as we call those. Those secondary votes don't to have two thirds majority is often which one of the options wins out of these. But the main RFC won't get accepted, unless there's a two thirds majority with a poll done on the wiki.

Henrik Gemal 11:46

What happens after the vote? You know if it's both if it's Yes or no?

Derick Rethans 11:53

I'll start with the easy case, the no case. If it's a no then the RFC gets rejected. That also means that sometimes an RFC fails for a very specific reason. Maybe some people didn't like the syntax, or it was like a one tweak where it would behave in a wrong way or something like that. But as a rule that says that you cannot put the same subject back up for discussion for six months, unless there are substantial changes. Now, this has happened with scalar type hints, for example, and a few other big ones. If an RFC gets accepted, then pending on whether there is an implementation, the implementation will get set up as a pull request to the PHP project on GitHub. And then the discussion about the implementation starts. If the implementation doesn't get to the point where it is actually good enough, or whether it can actually not be implemented in a way that it doesn't impact performance, it still might end up failing, or might not get merged. And in some cases, it means that a feature will get added at some point but it might not be necessarily in the PHP version that it got targeted for. I don't actually have an example for that now. If the implementation is already good and already discussed it can get merged pretty much instantly. And then it will be part of the next PHP version.

Henrik Gemal 13:08

How many RFCs voted on every year? And what majority voted yes or no?

Derick Rethans 13:16

I don't have the stats for that. But there is a website called RFC watch, where you can see which RFCs had been gone through and which one had been accepted or not, in a nice kind of graph way. I will add a link in the show notes for that. I would guess that during a year, about 50 RFCs are voted on. And I will think that about half of them are passing. But that's a guess I don't have the stats.

Henrik Gemal 13:42

Thank you very much for the answers. It brought me closer to the whole process of the PHP development. You have any other things to add?

Derick Rethans 13:52

I don't think so at the moment. I think what we she'd be a bit careful about is that although we're getting closer and closer to feature freeze at the end of June. We currently have just elected the new PHP eight zero release managers, but I keep the names secret, because this podcast is recorded in the past. They are going to be responsible now for doing all the organisatorical work for PHP eight zero. And that also means that feature freeze will happen at the end of June somewhere. And I expect to see a bunch of RFCs coming up with just enough time to make it into PHP eight zero, or not. So that's going to be interesting to see what comes up there. But other than that, I think we have explained most things in the RFC process now. And I thought it was a fun thing for once somebody else asking the questions and me giving the answers. And I think in the future, I think I would like to do like a Q&A session where I have multiple people asking questions about the PHP process. I also thought this was a good experiment and thanks for you taking the time to ask me all dthese questions today.

Henrik Gemal 15:00

No problem. I love your podcast. I listen to it whenever I bike to work. It's nice to get some insights into the PHP development.

Derick Rethans 15:10

Yeah, and that is exactly why I started it. Thank you Henrik for taking the time this morning to ask me the questions. And I hope you enjoyed it.

Henrik Gemal 15:18

Thank you very much for having me on the show.

Derick Rethans 15:22

Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week.


PHP Internals News: Episode 49: COPA

PHP Internals News: Episode 49: COPA

In this episode of "PHP Internals News" I converse with Jakob Givoni (LinkedIn) about the "Compact Object Property Assignment", or COPA for short, RFC that he is proposing for inclusion in PHP 8.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 49. Today I'm talking with Jakob Givoni about an RFC that is made with a very long name, the compact object property assignment RFC or COPA for short. Jakob, would you please introduce yourself?

Jakob Givoni 0:39

Yes, my name is Jakob. I'm from Denmark, and I've been working programming in PHP for 20 years now. I work as a software engineer for a company in Barcelona that's called Vendo. I got inspired to get involved in PHP internals after I saw you as well as Rasmus and Nikita in a PHP conference in Barcelona last November.

Derick Rethans 1:00

there was a good conference, I always like going there. Hopefully, they will run it this year as well. What I'd like to talk to you about today is the COPA RFC that you've made. What is the problem that this is trying to solve?

Jakob Givoni 1:14

Yes, I was puzzled for a long time why PHP didn't have object literals. And I looked into it. And I saw that it was not for lack of trying. Eventually, I decided to give it a go with a different approach. The basic problem is simply to be able to construct, populate, and send an object in one single expression in a block, also called inline. It can be like an alternative to an associative array. It gives the data a well defined structure, because the signature of the data is all documented in the class.

Derick Rethans 1:47

Of course, people abuse associative arrays for these things at a moment, right? Why are you particularly interested in addressing this deficiency as you see it?

Jakob Givoni 1:57

Well, I think it's a common task. It's something I've been missing, as I said inline objects, obviously literals for a long time, and I think it's a lot of people have been looking for something like this. And also, it seemed like it was an opportunity that seemed to be an fairly simple grasp.

Derick Rethans 2:14

What kind of solutions do people use currently, instead?

Jakob Givoni 2:18

I think, very popular one is the associative array where you define key value pairs as an array. The problem with that is that you don't get any help on the name of the indexes nor the types of the values.

Derick Rethans 2:33

I mean, it's easy to make a typo in the name, right? And it just either exists in the array suddenly, if you set it or you just get a random null value back. As you said, yeah, there's no way of enforcing the type here, of course. COPA compact object property assignment is a mouthful, and it is a new bit of syntax to the PHP language. What is this new syntax going to look like?

Jakob Givoni 2:55

While it looks just like when you assign a value to a property, but here you can add several comma separated lines of property name equals value inside a square bracket block, which is coming after the array and the array arrow operator. The syntax shouldn't really conflict with anything else we have at the moment.

Derick Rethans 3:17

Because that's becoming more and more of a problem, right? Finding new bits of characters to use for new syntax. It is something that came up with annotations or attributes as well.

Jakob Givoni 3:27

And then to start talking about, does this look like typical PHP? Or do you just like this syntax? Or do you hate it? It becomes a taste based thing. For me, the important thing is that if it works, and if it's fairly trivial to implement, I don't have a problem with it.

Derick Rethans 3:43

There was a related RFC early in the year which was called the object initializer RFC. How is your proposal different from that one?

Jakob Givoni 3:51

The object initializer is a new concept. Mine is different in in that I didn't want to introduce any new concepts. My approach was focused on pragmatism. In that other RFC, the initialization is done at the construction time. And you can kind of do it without even having to define your constructor. And one of the most important aspects of that one was to enforce that all the mandatory properties have been initialised. Because you can have type properties in PHP 7.4. If they don't have a value, then there is introduction of this new state of uninitialized properties. And the author of that RFC wanted to make sure that once the object was ready was fully constructed, it would validate that there was nothing missing there. So it has like six out of seven characteristics in common with mine, and one characteristic that is different. I looked into this about the mandatory promises and I didn't find a simple way or an obvious way to handle it. I have one idea if this COPA should pass and I have another idea if it fails. I didn't want to include that it was not part of my main goals.

Derick Rethans 5:01

I'm looking at the syntax here for a bit. And it seems that way how you can do this COPA block. If you have an object, you use the arrow which is dash greater than sign square brackets, and then the list of properties that you want to assign values to. And the RFC shows that to be equivalent to doing each line manually yourself. Does that mean that it is only works for public properties?

Jakob Givoni 5:31

No, it would work also, for what do you call it, virtual properties that don't actually exist, or if they're private, it would just invoke the magic set method in that case. The same thing would happen as if you were to do the assignment line by line as in the example.

Derick Rethans 5:48

Without there being the underscore underscore set method set, it means that you can only really set the public properties in that case.

Jakob Givoni 5:56

You won't be able to set private or protected properties directly unless the magic method does that.

Derick Rethans 6:03

So does that mean that it is pretty much only something that happens in syntax, and it doesn't have any other side effects or any other functionality that you wouldn't already be able to do?

Jakob Givoni 6:15

Yeah, it's just a new syntax for that. The emphasis here was pragmatism. So not introducing any new concepts.

Derick Rethans 6:23

What would use cases for this be?

Jakob Givoni 6:25

Typically, as I mentioned, they're data transfer objects, value objects. Those simple associative arrays that are sometimes used as argument backs to constructors, when you create objects. Some people have given some examples where they would like to use this to dispatch events or commands to some different handlers. And whenever you want to create and populate and and use the object in one go, the COPA should help you.

Derick Rethans 6:58

I suppose COPA would also work for standard class objects?

Jakob Givoni 7:02

It's an object just like anything else. So yeah, yes, there shouldn't be any surprises.

Derick Rethans 7:07

But of course, it doesn't really make a lot of sense to use standard class because then again, of course, you don't have the benefits of checking your property names or types, again, of course. Are the other use cases you can think of?

Jakob Givoni 7:19

Why don't have anything else in mind.

Derick Rethans 7:22

I remember quite a long time ago, because this is a subject that comes up quite a bit. That's pretty much people that write PHP code abuse associative arrays so much. Just like the object initializers RFC, as well as your COPA RFC, try to use objects in a different way to be able to prevent developers from abusing associative arrays, pretty much as more stricter data types. In languages like C, there's a distinct datatype for this is called a struct. Do you think it would make sense that instead of trying to overload our object semantics, then in stats use, or introduce something like a struct concept of that C or other kind of statically typed languages have?

Jakob Givoni 8:10

As I understand it, a struct is basically the same thing as structured as what I'm talking about structure set of data. However, I'm not sure if it's worth it to introduce a new concept. I don't know if it's necessary if it's possible to reuse the things that we already have enough familiar with. I think I would prefer that you call it overloading the object. But I don't see a lot of problems with having an object that is simply a list of properties with values. It's a very basic object. An object doesn't need to have any methods, it's possible to use that. Every time we add a new concept like struct would be, I feel that it would lead to a combinatorial explosion of implications that later you need to assess every time you want another future change. I haven't seen any RFCs that have specifically mentioned structs. But it is a very related concept.

Derick Rethans 9:08

I'm just asking because I spent a lot of time in C where we have structs. But we don't really have objects or classes to begin with. It's more familiar for me to use that. And the other reason why I was asking is that perhaps it would be possible to create like a slightly more natural syntax, because, in my opinion, I think the one that you currently have chosen isn't particularly the most friendly one, but that's my own opinion here.

Jakob Givoni 9:33

There might be a window of opportunity, because curly brackets after the variable is going to be deprecated as a way as an array access. So maybe that could be used just curly brackets and dropping the arrow itself. That would look a lot more like like an object, I think, and it would also be shorter.

Right. I mean, PHP 7.4 deprecated these.

So the question is just how soon can we remove it and replace it to mean something else completely?

Derick Rethans 10:03

Yeah, that's a good question. I don't think I have the answer either. I guess it can be introduced as long as syntax that existed previously would now not do something different. And I think you would actually be okay here.

Jakob Givoni 10:15

I'm pretty sure it would throw a syntax error. If you try to run this code in a previous version.

Derick Rethans 10:21

I meant saying if you would reuse the curly braces, because as you said, they have been deprecated in PHP 7.4.

Jakob Givoni 10:28

I mean, if someone were not to follow that deprecation notice, that is now in place and would continue to keep their the code. If we change the implementation, it's better to get a clear, fatal error than to just have something really spurious happening.

Derick Rethans 10:45

Yes, absolutely, I definitely agree. Now, that's sort of what I was trying to get at, but you explained it more eloquently than I did. The RFC lists a few special cases. It talks about execution order and exceptions. I think some, somebody brought up somewhere that what happened If we're trying to set multiple properties through COPA and say the second out of three throws an exception. What would be the end state of the object for example? Could you talk a little bit through that?

Jakob Givoni 11:11

Regarding exceptions being thrown in any of those expressions where you are assigning, it's important to understand that the block of code that is COPA is not an atomic operation. Anything that happened before the exception will still have happened. And everything anything that happens after won't happen. Exactly like what you would expect if you were doing it line by line. Or if you were using method chaining to do several things on an object. I think it's going to happen what you would expect to happen unless for some, I think it might be unintuitive, that it's not an atomic operation. But it's just important to keep that in mind. That's why I listed it under special cases. And there's something similar with the execution order, in that you can list the properties in any order you like. It doesn't necessarily mean that you're going to get the same result if you change the order because you will be able to use the value of a previous assignment in the next one. Again, not 100% intuitive, but I think it might be worth the trade off in implementation and flexibility.

Derick Rethans 12:19

As you mentioned, there's no new semantics in there. Talking a little bit about implementation here. As there is no patch available, is this something that you'd be interested in developing yourself? Or are you looking for somebody else to help you out on that?

Jakob Givoni 12:32

I actually haven't contributed any code before. I'm not familiar with C. But one reason that I chose this RFC and this approach is also that if I can't get any volunteers, I might be able to learn and to do it myself, since it seems like it's mostly a parser syntax thing, probably should be able to pick that up.

Derick Rethans 12:53

I would also think because there is no new semantics in here, that it would instead be something in between, probably just the lexer that we have, the parser, and then constructing an equivalent abstract syntax tree or AST segment out of that.

Jakob Givoni 13:12

I would be thrilled to collaborate with someone to do some pair programming in order to get started if anyone is up for it.

Derick Rethans 13:18

So if you're listening to this episode, and you want to help Jakob out, why not get in touch with him? His contact details will be in the show notes for sure. The RFC also lists a few things that you have thought about, but you have decided not to either pick up into the RFC or you don't think they are in scope. Would we'll talk about that a little bit?

Jakob Givoni 13:36

There's some special things that you can do at the moment when you assign a value to a property. Things like using a variable to specify the property name, or to generate the property name from an expression using the curly brackets after the arrow. There's also array access directly on the properties, or increment, decrement, or nested object accesses. I don't think that these things are really essential. I've decided to probably leave it out of scope for now unless it's trivial. If it if it's trivial to implement that as well. It's okay with me. It's not deal breaker. But you have to do a cost benefit analysis. And I'm thinking that it could be a future scope. If there's a demand this can be addressed in a later RFC.

Derick Rethans 14:23

The RFC also talks about nested COPA. But it looks so complicated to me that I'm not sure whether it is actually something that we even should add to begin with.

Jakob Givoni 14:34

I don't think it's as complicated as it looks. So you can already already do nested COPA in if you create a new object inline as well as you of course, you can assign it to a property in the outer scope of the COPA. But if you want to over, to set just one property of a nested object, then you cannot do that directly. Well, you can do it actually if you access the previous one. Because you have access to the current property when you do their assignments. So you can see in my example that you can do it. But there might be a better syntax for doing that.

Derick Rethans 15:11

I'm happy to see that there's no backward incompatible changes. So that's always a win. What has been the feedback so far?

Jakob Givoni 15:17

Yeah, the feedback has been mixed bag as to say. There's some recognition that this has potential to be a useful feature. This is a critique of the syntax, as you also mentioned, and then about the missing functionality, like the mandatory properties and atomic operations. And then of course, named parameters always comes up. The PHP internals list. It's a tough crowd. I really enjoyed engaged in this project. So I don't mind it's part of it. I also really like this side discussion that we're having currently about ways to improve the way that we collaborate and make progress, especially on tough issues.

Derick Rethans 15:58 That has definitely improved over the last five years to a decade, but it can always be improved more, I would say. What is your end goal with this RFC? I guess you would like to see this added to PHP at some point, are you targeting it for PHP eight?

Jakob Givoni 16:13

I would be extremely proud to see this added to PHP at some point. And if it can make it into PHP eight in the first release, that would be awesome. That's at least what I'm going for, for now.

Derick Rethans 16:25

The PHP project is looking for release managers for PHP eight zero, with feature freeze happening at the end of June somewhere. So there's lesser and lesser time available for doing these things. So I'm curious to see where this ends up.

Jakob Givoni 16:39

It's a race against time at the moment.

Derick Rethans 16:42

But that's always the case, isn't it? I think be interesting to see if, if somebody wants to help out to make the implementation of this, or rather, I'd be interested to see whether you'd be able to pick up that yourself actually. We can always do with more people that work on a PHP language. Do you have anything else to add yourself?

Jakob Givoni 17:00

I'd say that I spent a lot of effort researching and writing this. And I just hope that people will study the RFC properly and keep an open mind. I know it's probably going to be a hard sell. And that's okay. I just wanted to give it a go. And this is just just the beginning of my contributions, I hope.

Derick Rethans 17:19

I spoke with Mate a little bit a few episodes ago. He was getting worried about it not getting accepted at some point. And I pointed out to him that scalar type hints took about a decade and seven attempts to finally make it into PHP. So it helps to just persist I would say in times.

Jakob Givoni 17:37

Times change and also you get new ideas and you evolve.

Derick Rethans 17:42

The language continues to improve and that's how I like it. Thanks, Jakob for taking the time to talk to me today. It was interesting to see what you're up to.

Jakob Givoni 17:51

My pleasure. Thank you so much Derick for having me.

Derick Rethans 17:56

Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week.


PHP Internals News: Episode 48: PHP 8, JIT, and complexity

PHP Internals News: Episode 48: PHP 8, JIT, and complexity

In this episode of "PHP Internals News" I discuss PHP 8's JIT engine with Sara Golemon (GitHub).

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 48. Today I'm talking with Sara Golemon about PHP 8 and JIT. Sara, would you please introduce yourself?

Sara Golemon 0:33

Hi there. Hi there, everybody listening to PHP internals podcast. I'm Sara. I've been on this podcast before. But in case you're just getting here to for the first time, welcome to the podcast. You have a nice backlog to go through. I am a lapsed web developer, come database security engineer by day, and an opinionated open source dev slash PHP 7.2 release manager by night and also day. I've been involved with the project for about 20 years now off and on. Somehow I just keep coming back for more punishment.

Derick Rethans 1:03

We're leading up to PHP 8, with lots of new features being added. But one of the biggest thing in PHP 8 that I've spoken about on the podcast on before all the way back last year in Episode 7, is that PHP eight is going to get a JIT engine. Would you care to explain what a JIT engine does again?

Sara Golemon 1:20

Well, I'm going to give you the short, you can look this up on Wikipedia in two seconds definition of JIT, means just in time compilation. That doesn't really tell you much, unless you listen to it on the sort of other half of that of AOT, or ahead of time compilation. AOT is what you expect from applications like GCC, you know, you just make an application that you've got C or C++ kind of source code to that's ahead of time. JIT is saying, well, let's take the source for application. And let's just run with it. Let's just start executing it as fast as I can. And eventually we're going to get down to some compiled code. That's going to run a little bit quicker than the initial stuff did. PHP already has this nice little virtual machine built into it. We call it the Zend engine. That takes your script and immediately just says: All right, well, what does this say in computer terms? Well, a computer readable term is a series of these op codes, they're also called byte codes in other languages that give you instructions for: run this type of instruction at this time and get something done. The PHP runtime interpreter interprets that one instruction at a time basically pretending to be a CPU. This works quite well, it runs quite efficiently. But there's still this sort of bottleneck in the middle there of a program pretending to be a CPU running on top of a CPU in order to run other code. The idea of JIT is that this thing sitting in the middle is going to gradually figure out what your program really is trying to do and how it's intended to run, and It's going to take those PHP instructions and it's going to turn them all the way down into CPU instructions, so that it can get out of the way and let the CPU run your code natively as if it had been written in a compiled AOT kind of language. What that actually means for execution of PHP code in PHP 8 is still sort of a, you know, a question that's, that's left to be answered here. I listened to your interview with Zeev. Episode 7, is a good episode of getting some good information on that. We do definitely agree on what the status of the JIT within PHP is, right now we can. It's subjective facts like this is how much work has been done largely by Dmitri, where we can kind of expect to see the best gains come from. I personally think I might be a little bit more pessimistic than him in terms of the actual performance impact we get out of it. I think we both recognise we're not going to see the two to one kind of improvements we saw from five to seven. Nobody's realistically expecting that, but if you look at the demo that Zeev ran a few months ago, where he shows the Mandelbrot set being generated in two different PHP requests, and then WebSocket out to a nice pretty display, it's a very visceral reaction because you can see one Mandelbrot set being calculated much, much faster than the other. And he acknowledges though this is not realistic PHP code, nobody's writing the Mandelbrot calculation in PHP. We can see that under certain workloads, it's definitely getting faster. But for PHP core mission, which is web serving, I mean, we both know that it's not going to be massively fast. I think it's going to be almost imperceptibly fast.

Derick Rethans 4:41

One question for my site, the Mandelbrot set, the implementation of that is all in a specific function, right? And it's all CPU heavy code, not IO.

Sara Golemon 4:51

Yes.

Derick Rethans 4:52

And it's all that in the same function.

Sara Golemon 4:54

Yes.

Derick Rethans 4:55

Now, what I was thinking of the other day is that how does this interact with calling standard library functions, because the JIT engine is going to have to go out of basically running things on the CPU and calling things that are then implemented in C to begin with.

Sara Golemon 5:10

So you're asking that question, because you already know some of the pitfalls of JIT, and you're leading me into it. And that's fine. When a JIT emitter is taking the language that it's emitting, so PHP. As long as it remains within the scope of PHP, it can sort of keep track of where it's at. It's like, Okay, I know this variable's init, your because I saw it get set. I know that this is going on here. I know that's going on there. And it can carry those assumptions around as it's admitting code. And emit very efficient code that doesn't need a whole bunch of double check guards of like: Wait, is this still an integer? Wait, is that still a string? All of these sort of like escape hatches for when things go wrong. Anytime you cross over into, I will say C-land, or internals land, or ahead of time compiled land. It's basically calling into what it sees as a black box. And it just says: Okay, here's some data, I know the types going in, have fun with it. And something air quotes happening in the air happens with that code and the black box spits out an answer. Well, by the time the black box has spit out the answer, the JIT that has taken that PHP code, no longer knows if any of its assumptions are true or not. It just has to say: Well, time to start from scratch, time to keep track of where we are from here, build up a new set of assumptions. So we get this speed bump in the road of executing code. And it turns out most PHP applications are using a whole lot of those internal API's because they're quite useful. There is a kitchen sink in PHP, and it does stuff. So you have these repeated hits of this road bump happening, and that's not great. If we want to compare this to other JIT languages that are out there. I might suggest we compare this to HHVM because of course, HHVM, at least in the beginning implemented a fairly close kin cousin to the dialect of PHP. It has since diverge much more and become hacklang. But it was doing the same thing, taking PHP code, running it native on the CPU and occasionally having to make that cross to this its own version of internals, or it was running C++ code. One of the ways to reduce those numbers of jumps is that they took a lot of those internal functions, the ones that actually didn't need to do anything, particularly internals ish, and just rewrote them in PHP code. And if you look at the HHVM source code right now, there is a big directory called systemlib and that's a whole bunch of hacklang code, read it as PHP code, that is implementing a lot of these very common quote unquote internal functions. We just had an RFC for function called str_contains(), that is a function that could have been hundred percent been written just as PHP code. Something could have thrown that into packagist. For the record, I voted against it because of exactly that. I think you should write that in packagist and just put it in your composer.json is okay. It's gonna pass anyway, it got a lot of votes. That aside over, that is a sort of function that if we were putting it into sort of an 8.X version of PHP, where we did have our own type of systemlib, we would have probably just said, let's write that as PHP code. So that the JIT, when it enters that function, can keep all those assumptions intact, and potentially even inline some of those instructions and avoid the function call entirely. That's basically taking all of the instructions that are part of the in this case, str_contains() function, and implementing them within the scope of the function that was calling it. So you skip that entire function call overhead, which a lot of people know is still one of PHPs sort of weaker points in terms of where that fat to trim is, as Zeev said in Episode Seven, we still have some parts of PHP that are a bit slow, irrespective of a JIT.

Derick Rethans 8:50

There are actually a few functions that have been inlined now into op codes. strlen() is an example of this where instead of it now being a function call, it's actually directly an opcode. Because it is a function that is used so much and actually gain a bit of performances there.

Sara Golemon 9:05

Yeah, I think all of these functions as well are just a single opcode for type check. Yeah.

Derick Rethans 9:10

There's a whole bunch of them for sure. I saw that earlier this morning, Dmitri produced, or proposed another branch in which he implemented tracing JITs, instead of the JIT that we already have, and I have no idea what the difference is between a normal JIT engine and the tracing JIT engine,

Sara Golemon 9:25

Ultimately, the distinction is not that important to end users, it's going to function the same, but it is a sort of an internal implementation detail. HVVM's by the way, is a tracing JIT. It basically looks at any given unit of work that it needs to translate, let's say a function, and it says, what are the pieces that have these sort of non branching parts attached to them? Let me look at each of the non branching pieces. And let me create a version of that translation based on the types that I expect to be going in there. If the types fail, I'm gonna have to create a new version of that piece. But then that piece can plug into this sort of chain of tracelets to create a full function. Most of the time, especially if you've written code that is well type hinted, you've got, you know, strict types turned on, you've got all of your types on the on the function parameters set. And it's very easy for the JIT to infer the types out of what you've put into your function. You're only ever going to need to create a single tracelet of any given section, and your full trace is going to be a single, unbroken chain of: do this, do this, maybe do a jump to another spot, just keep doing this, doing this, doing this. If you have, let's say, slightly messier code, maybe you're not using any kind of type hinting it becomes very difficult to infer any of the types, because there's lots of different call sites, that are doing lots of different things. We may end up having some functions that have multiple tracelets per body section that get built into the giant bush of interconnected edges, that's less ideal in terms of maximising performance, but it still at least functions.

Derick Rethans 11:06

We have spoken a little bit about what a JIT engine is and sort of how it works. It sounds quite complex and complicated.

Sara Golemon 11:14

It is definitely complicated. And I'm feeling like that's another lead. And so I'll just run with it.

Derick Rethans 11:19

I've also got to say my next leading question... Maybe I should actually ask the question?

Sara Golemon 11:24

Well, let's actually take a step back from the JIT for a second. And let's look at where the engine is right now. So the engine is basically two very large pieces. That's the sort of the extension library of all of the runtime functions. Everything you see exposed in user space, and the actual scripting engine. There are some other smaller pieces, but those are two, the two really big pieces. There are a whole lot of people pay a whole lot of attention to the extension piece, because that's the flashy bit. That's the part that gives you some bit of binding that you didn't have before, or some bit of functionality that can be delivered out of the box as part of that kitchen sink. And that definitely needs attention. I'm glad that that continues to evolve. But the scripting engine is that piece that defines syntax and how code is actually going to run.

Derick Rethans 12:09

Reading extension's code as a whole lot easier than reading the engine code.

Sara Golemon 12:13

And that's where I was going to go with that, yes, if you look at the code that's under ext, you can even come into that code without knowing any C at all. And you can actually make pretty good sense of a lot of it because a) PHP uses a whole lot of macros. So every function is literally defined with a macro that says: PHP_FUNCTION, like right here, PHP function, every class method, PHP_METHOD, here's the class name. Here's the method name. And what these things do are pretty clear sort of API's. They're very small bite sized pieces for the most part. The bits that involves sort of defining a class and how it does its memory management, those get a little bit more complicated, but I think on the whole extension code is far more accessible. If you go and look at the engine, particularly the runtime pieces of the engine, although the compiler is complex as well. You have to do a lot of digging before you even get to a point that you can see how the pieces maybe start to fit together. You and I have spent enough time in the engine code that we know where to look for a particular thing. Like let's say that opcode, you mentioned that implements strlen(). We know that, oh, zend_vm_def.h has got the definition for that. We also know that that file is not real code. It's a pre processed version of code that gets built later on. Somebody coming to that blind is not going to see a lot of those pieces. So there's already this big ramp up just to get into these engine as it exists now in 7.4. Let's add JIT on top of that. You've got code that is doing call forward graphs, and single static analysis, and finding these tracelets, and making sense of the code at a higher level than a single instruction at a time, and then distilling that down into instructions that the CPU is going to recognise. And CPU Instructions are these packed complex things that deal with immediates, and indirects, and indirects of indirects, and registers. And the x86 call ABI is ridiculous thing that nobody should ever have to look at. So you add all this complexity to it, that by the way, sits in ext/opcache. It's all isolated to this one extension that reaches into the engine, and fiddles around with things to make all this JIT magic happen. You're going to take your reduced set of developers who know how to work on Zend engine, and you're going to reduce that further. I think at the moment, it's still only about three or four people who actually understand how PHP's JIT is put together enough that they can do any effective work on it. That worries me for sure. I don't think that's an insurmountable hill to climb, especially if we can start getting some documentation written about it, at least from a high level point of view. Hey, you know, look over here to find this stuff. Look over here to find that stuff. Something to get started. So the people who have at least that basic understanding of how the VM part of the Zend engine works can sort of upgrade their knowledge to get into to the JIT. I only think that's worth it. If we actually get real performance boost out of JIT. If we actually turn the JIT on, and we see that for PHP's core workload, which is web serving, we're only seeing a one to 2% gain. For me, that's not enough. It may be enough for others. But for me, I would call that experiment, not a failure, but a non success at that point. Certainly there are people out there who are still going to want to use it, because they are you doing command line applications, and they're doing complex math. And I'm not saying we can't have it. I'm just saying it takes less than a forward stage that point.

Derick Rethans 15:43

Somebody mentioned earlier in the chat room. It's also another set of potential bugs, right?

Sara Golemon 15:48

It is definitely another potential bugs.

Derick Rethans 15:51

It's pretty much another implementation of the PHP syntax bits of PHP.

Sara Golemon 15:57

So if you run an application and you get behaviour you don't expect, where is that behaviour actually coming from? You can spend a lot of time looking in Zend engine because you're thinking like: Oh, well, this is the thing that executes opcodes. And when I run it in a single command line, it's definitely going through this bit of code, but it works on a single command line run. But at the twentiest request on my web server, it's not working. Why is that happening? Well, it turns out, it's happening, because that's when the JIT has finally kicked in, because it has enough information. And it's running through this tracelet that was just a little bit wrong. And well, crap. You mentioned I think, at one point, when we were talking in Miami just a couple months ago, that you're just gonna have to turn the JIT off entirely when Xdebug is running,

Derick Rethans 16:41

Just like I'll already turn OPCache optimizations off, because there's just too confusing for people.

Sara Golemon 16:46

It's confusing and complex, but it's also it may not even be 100% possible because we are right there down at the bare metal of running CPU instructions. There's not a lot of opportunity to just say like, Oh, hold on Mr. CPU, let me just take a look at your registers right now. Okay, this is okay, let's go ahead and keep going now. The VM that we have now in in Zend lends itself 100% to those kinds of activities, CPU does not. What that means is that what we experience in the development mode with Xdebug running is not going to be the exactly the same thing that we experience in real runtime code. And I don't know if we have a solution for that.

Derick Rethans 17:23

As far as I know, there's no solution for it at all.

Sara Golemon 17:26

I was trying to cage it in the hope that maybe we could someday have solution for it.

Derick Rethans 17:30

It'd be lovely, but I can't see that happening to be honest. I think it's going to be important to find out how much this actually benefits, real live code. How does it benefit your Laravel project or your Symfony project or anything like that? I think it's going to be hard to now make a case for not shipping PHP 8 with a JIT. I think that'd be a bit unfair. But on the other side, if it's, as you say, only really gives you one or 2%, whether this is worth have the additional complexity. The additional maintenance burden as well as another opportunity for having bugs that are a lot harder to reproduce, but it's actually worth having it at all?

Sara Golemon 18:11

I definitely don't want to poopoo on the JIT effort.

Derick Rethans 18:14

Oh, no, absolutely not.

Sara Golemon 18:15

I think this is an important experiment to run. And I think if 8.0 as a whole winds up being a sort of public beta experiment of it, that will definitely give us a lot of good information. And I am super hopeful that we see better percentages, that we see 5-10 maybe even 15%

Derick Rethans 18:31

Absolutely.

Sara Golemon 18:32

I want to be guarded in what I how I talk about it on a podcast like this because I don't want anybody say: Oh, 8's gonna be great. Our code is gonna run 10 times as fast as it was running before No, that's not gonna happen two x is not gonna happen. We're talking much lower numbers than that. Be guarded, be hopeful, but 8.0 is going to be, as I said, it's going to be that sort of public beta experiment.

Derick Rethans 18:55

I think that's great. I think running this experiment again because ta similar experiment was, of course run during the PHP 5.6 days when PHP 7 came out. Originally with PHP 7, was PHP with a JIT engine. And then Dmitri and others found out that it was so much other things that could be done to make PHP run pretty much twice as fast.

Sara Golemon 19:16

Yeah, there was a lot of really low hanging fruit.

Derick Rethans 19:19

Yep. And that was great to see. I am apprehensive about people thinking that the JIT engine in PHP eight is going to similar performance boost.

Sara Golemon 19:29

We'll see. Nothing to say about it, but then: we'll see.

Derick Rethans 19:32

But I would suggest is that if you're interested in seeing what this can do for your projects, you should go try it out. Download PHP's master branch, enable it and see how it goes.

Sara Golemon 19:41

And of course, make sure you are running on x86 hardware. I doubt very much that he's bothered to put more than one back end on this.

Derick Rethans 19:48

I don't actually know.

Sara Golemon 19:49

I haven't looked. He might be using some helper library for it. So it's possible that we're hitting multiple backends. But this is probably going to be an x86 only thing and possibly a Linux thing. I should find out the answer to that question.

Derick Rethans 20:00

I should do too. Okay, Sara, thanks for taking the time this morning to have a chat with me about PHP 8' JIT efforts.

Sara Golemon 20:08

It's fun as always, I always love to speak with you Derick. You bring a bright Corona of sunlight to my day.

Derick Rethans 20:16

Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week.


PHP Internals News: Episode 47: Attributes v2

PHP Internals News: Episode 47: Attributes v2

In this episode of "PHP Internals News" I chat with Benjamin Eberlei (Twitter, GitHub, Website) about an RFC that he wrote, that would add Attributes to PHP.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 47. Today I'm talking with Benjamin Eberlei about the attributes version 2 RFC. Hello, Benjamin, would you please introduce yourself?

Benjamin Eberlei 0:34

Hello, I'm Benjamin. I started contributing to PHP in more detail last year with my RFC on the extension to DOM. And I felt that the attributes thing was the next great or bigger thing that I should tackle because I would really like to work on this and I've been working on this sort of scope for a long time.

Derick Rethans 0:58

Although RFC startled attribute version two. There was actually never an attribute version one. What's happening there?

Benjamin Eberlei 1:05

There was an attributes version one.

Derick Rethans 1:07

No, it was called annotations?

Benjamin Eberlei 1:08

No, it was called attributes. There were two RFCs. One was called annotations, I think it was from 2012 or 2013. And then in 2016, Dmitri had an RFC that was called the attributes, original attributes RFC.

Derick Rethans 1:25

So this is the version two. What is the difference between attributes and annotations?

Benjamin Eberlei 1:30

It's just a naming. So essentially, different languages have this feature, which we probably explain in a bit. But different languages have this. And in Java, it's called annotations. In languages that are maybe more closer home to PHP, so C#, C++, Rust, and Hack. It's called attributes. And then Python and JavaScript also have it, that works a bit differently. And it's called decorators there.

Derick Rethans 1:58

What are these attributes or annotations to begin with?

Benjamin Eberlei 2:01

They are a way to declare structured metadata on declarations of the language. So in PHP or in my RFC, this would be classes, class properties, class constants and regular functions. You could declare additional metadata there that sort of tags those declarations with specific additional machine readable information.

Derick Rethans 2:27

This is something that other languages have. And surely people that use PHP will have done something similar already anyway?

Benjamin Eberlei 2:35

PHP has this concept of doc block comments, which you can access through an API at runtime. They were originally I guess, added as part or of like sort of to support the PHP doc project which existed at that point to declare types on functions and everything. So this goes way back to the time when PHP didn't have type hints and everything had to be documented everywhere so that you at least have roughly have an idea of what types would flow in and out of functions.

Derick Rethans 3:07

Why is that now no longer good enough?

Benjamin Eberlei 3:09

Essentially, user land developers use doc blocks to put metadata in there, and you could access them through an API. We had two sort of standards, or we still have two standards that use this. The documentation standard coming from the PHP documentor community. And then mostly runtime use case that exists now is covered by the doctrine annotations library, which, incidentally, I have also worked on a lot. It is used, for example, by the Symfony community, by the Drupal community, and by a few other communities as well that are smaller that wanted to go into the direction of using annotations in this case or attributes.

Derick Rethans 3:53

What would doctrine use an annotation for?

Benjamin Eberlei 3:55

I said before that annotations, add metadata to declarations. So let's say you have in your code, for example, classes that you want to store in the database. So you need to map PHP classes to database tables and back. Usually, you would do that using some kind of configuration. And configuration can be many folds. So the easiest way would be to write this in PHP, say, this is the column name, this is the field name, this is the class name and then store and use this information. And then you can go and store this in ini files, yaml files, XML files. The problem with this kind of approach is often that you have the configuration file and you have the class, and they are totally separate from each other, usually in very different places of the codebase. This is not some kind of configuration that is fluid. It's very, very static configuration that depends on the class. And it will not really change unless the class also changes. So changes are usually done together. In this case, it might make sense to put the configuration on to the class. Because then you see the declaration, you see it's configured in some way. And then you can more easily understand that changes affect each other in some way. And it leads to less mistakes, in my opinion. And it makes it a little bit more obvious that the class is used in some configured way.

Derick Rethans 5:26

We've had a quick look at what annotations are. The RFC introduces them in a different way, the attributes that you're not proposing, how are they different from the doc block comments?

Benjamin Eberlei 5:37

The idea is that we introduce a new syntax that is independent of the doc block comments. Essentially, before each declaration, you can use the lesser than symbol twice, then the attribute declaration, and then the greater than sign twice. This is the syntax I've used from the previous attributes RFC. And Dmitri at that point used the syntax from Hack. And it makes sense to reuse this not because Hack and PHP are going in the same direction any more. But because Hack at that point they introduced it that they had the same problems with which symbols are actually still easy to use. And we do have a problem in PHP a little bit with the kind of sort of free symbols that we can still use at certain places. And lesser than and greater than at this point are easy to parse. There are a bunch of alternatives and one thing that I will probably propose is an alternative syntax where we start with a percentage sign, then the square bracket open and then a square bracket close. This is more in line with how Rust declares attributes. While Rust uses the sort of the hash symbol, which we can't use because it's a comment in PHP.

Derick Rethans 6:54

And you don't want to use emojis.

Benjamin Eberlei 6:55

Some crazy people propose to use emojis which would easily work in PHP, but I guess it would be hard to remember the number to get the Unicode sign.

Derick Rethans 7:06

Within the two opening lesser than signs and two greater than signs to close it. What's in the middle?

Benjamin Eberlei 7:12

You declare an attribute name. And then you sort of have a parenthesis open, parentheses close, to pass optional arguments. You don't have to use them. So you can only use the attribute name. If you sort of want to tag something: just this is a validator, or this is an event listener, whatever you come up with, to use attributes for. But if you need to configure something in addition, then you can use. The syntax sort of looks like if you would construct a new class, except that you don't have to put the new keyword in front of it.

Derick Rethans 7:45

It looks like function arguments pretty much.

Benjamin Eberlei 7:47

Yes, exactly. Yeah.

Derick Rethans 7:48

What kind of values can you use in the optional arguments to the attributes?

Benjamin Eberlei 7:53

The attributes are not really runnable code in a way. Since they are declarations, they don't allow arbitrary PHP code to run there. What is obviously allowed a simple literal values, so a number, or a fixed string, a fixed array declaration, and all this kind of things are possible. What is also possible is exactly the same expressions that you can also declare in class constants. So, in the class constants, you can do simple mathematical expressions, you can reference other constants. So, this is something that will be very interesting for attributes to do reference class names for example.

Derick Rethans 8:34

What happens if you define an attribute on a declaration element?

Benjamin Eberlei 8:38

What happens is that while the PHP script gets compiled, it will see that there are attributes declared and it will parse the attributes and similar to the doc block store them on the internal structure for future reference. Attributes are parsed in my current proposal in a way that you can have every attribute just once. This is something that is still under heavy discussion, because there are a few good ideas why you would need two, or multiple. Essentially similar to how a doc block is a string, we then store an array, which represents the attributes belonging to the class or the function or the constant. And this is something that the engine stores and also stores it in OPCache.

Derick Rethans 9:27

How would you access these attributes?

Benjamin Eberlei 9:28

Attributes are accessed through the reflection API. The reflection API also allows access to doc blocks. For attributes that would be a new function called getAttributes(). And it returns a list of all attributes using a new reflection class called ReflectionAttribute. There you can access what name does this attribute have? What are the arguments that are passed? And then this goes into one of the next features of this RFC proposal. You can also ask it to return this attribute as an object instance.

Derick Rethans 10:05

An object instance of which class though?

Benjamin Eberlei 10:07

Attributes, and this is something that is different to the initial version, the version one attributes RFC is, attributes names resolve to class names. That means if you declare an attribute, for example, Foo, and you have an import for our class, MyApplication/Foo, then during passing the attribute will be resolved to my attribute view name. It uses the same mechanism for class resolving that is used in every script. It reflects the use statements that are declared in the file. And you can use namespaces, namespace operators to reference the attributes as well.

Derick Rethans 10:49

These are attributes not classes, so I don't quite see all the link between the attribute names in the classes is?

Benjamin Eberlei 10:55

One problem with the original doc block based system was that there are conflicts between attributes of different systems. One library would have a type annotation, or a var annotation, and some other library would also use it. This could lead to conflict if the syntax for them was slightly different. So this would lead to problems when multiple parses would use the same attribute. And they would parse them differently. And this could lead to errors. One problem that was mentioned in the initial attributes RFC and that, I think, if you vote us all so used as a reason for voting no is that there was no namespacing, which means that different libraries could clash and their use of attributes. My idea was we already have classes, we have namespacing. We can resolve this by using this mechanism. You declare an attribute and an attribute always resolves to a class. In the best case scenario, you would also declare this class in your code. Essentially, the attribute is not an attribute, but it's a special class that represents an attribute. This is also shown in the code that by having an additional interface, or a sort of a marker interface, that attributes can implement to make it obvious that they they are used as an attribute.

Derick Rethans 12:19

You mentioned that you could access the attributes through reflection API, and you can get them out as an object?

Benjamin Eberlei 12:25

Yes, this is why I mentioned before that the syntax sort of looks like constructing a new object, but without the new keyword. When you access the objects through the reflection API, it would essentially instantiate the class, and all the arguments that you put into the attribute declaration are passed into the constructor of the object. And this is why the connection is there between a class and an attribute. It directly goes to instantiating the attributes as an object using the arguments and giving the developer access to them.

Derick Rethans 13:00

Does it only do something like this when you use the getObject() on the reflection arguments? Or is it also possible that I don't care about these classes things whatsoever, and I can just get a list of attributes and their optional values that are associated with them?

Benjamin Eberlei 13:16

You don't have to have a class, and the class name resolving in PHP is independent of classes actually existing. The attributes RFC respect that. You can just import anything that is not a class and use an import statement to shorten the attribute usage, or you can use the absolute namespace syntax to put a fully qualified attribute name into your code. And it wouldn't fail. The fail would only happen when you call the method on ReflectionAttribute to get the attribute as an object. So this is something the RFC is also in flux with and about to change it. The first version mentioned that attributes will always be auto loaded when they are declared at compile time. This would essentially treat attributes similar to base classes or interfaces, in a way that they are always resolved, they're always checked. However, this is a little bit overkill for userland attributes. And a lot of feedback was related to this should only happen when the reflection API is used. So I'm going to change this. One thing that we do need to handle in a way is a built in attributes. One reason why I want to add this RFC as well is that there are a few use cases coming up in PHP itself, that could benefit a lot if we had built in attributes. Since we don't have a clear path forward there. But Nikita has published his ideas on editions. So there's some paths forward to having PHP code work slightly differently depending on what developers want. Attributes could be helpful there. Other things for example, the JIT. JIT has features where you can at the moment use doc block comments to declare methods as always JIT-able or never JIT-able. Dmitri used doc block comments to check for JIT or no JIT tag in there. This is essentially something that attributes should be used for because should be machine readable. Then there's a lot of other stuff that for example, Rust also put forward that PHP is struggling with: conditional declarations of functions. For example, Symfony has a polyfill library that adds functions that are in higher languages, re implements them in a way that they're also available in lower versions where they don't exist in core. There are a lot of hacks around the sort of conditional declaration of functions and classes and stuff that make it difficult for OPCache to actually cache the files. I believe there are also even more problems if you use these kind of fights with pre loading. Essentially what could be done with attributes would be something like conditionally declared as function only if it's on PHP 7.3 and lower something like this.

Derick Rethans 16:13

You just mentioned using JIT or no JIT as an annotation. Does that also mean that extensions have easy access to these attributes?

Benjamin Eberlei 16:21

OPCache's not a PHP core functionality. It's still its own extension. The idea is that extensions have access to attributes in a very simple way. So there will be a Zend API, sort of an internal name for an API that the Zend engine provides to extensions and extensions will be able to access attributes and make decisions based on this. Extensions can already hook into the compile step of PHP and there's a hook called zend_ast_process. During AST processing, you can do stuff. That would be one way to, for extensions to look at attributes and maybe change code if they want. Then the engine obviously has tonnes of other hooks where the declarations are available in the data structure that the Zend engine provides. So there's zend_class_entry, for example, where you could look into the attributes as an extension and make decisions.

Derick Rethans 17:20

This is a pretty new RFC, and hence there're always going to be few open issues. Because we like to argue about stuff. What are the open issues on this RFC?

Benjamin Eberlei 17:29

This is the seventh RFC on this topic. So there has been a lot of discussion. I guess this feature is, in a way quite controversial because of the implementation details. A lot of my work now will be to find the best implementation that can actually make this feature part of core by getting enough votes for it. And so I gathered a lot of feedback from the community; also talked a lot to contributors. Changes that I will be probably doing is allowing multiple attributes. What I said before, the auto loading has to be clarified. There has to be some distinction between internal attributes and user land attributes in a way that doesn't require auto loading. Hack, for example, has __ as a magic prefix, which I want to avoid, because it puts up all this magic methods, sort of argument back on the table. We need to have something to make a distinction between userland and internal attributes, because the internal attributes need to be validated very strictly at compile time. And the userland attributes need to be validated only when you call the getAsObject() method on the reflection API.

Derick Rethans 18:42

How long do you think there'll be before you put this RFC up for a vote?

Benjamin Eberlei 18:46

It's a bit tricky because this issue is so controversial. I don't want to invest month of work and then get a no vote. And so I do want to have some feedback quite quick enough. I do realise that the first draft needs some work and clarifications that would otherwise lead to no votes from contributors. So I hope to get this done in, let's say, two to four weeks of additional work.

Derick Rethans 19:09

All right, Benjamin. That was a great explanation of the attributes version two RFC.

Benjamin Eberlei 19:16

Thank you for having me, and I really appreciate it again.

Derick Rethans 19:21

Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week.


PHP Internals News: Episode 46: str_contains()

PHP Internals News: Episode 46: str_contains()

In this episode of "PHP Internals News" I chat with Philipp Tanlak (GitHub, Xing) about his str_contains() RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 46. Today I'm talking with Phillipp Tanlak, about an RFC that he's made titled str_contains. Phillipp, would you please introduce yourself.

Philipp Tanlak 0:35

Hey, Derick. My name is Philipp. I'm 25 years old and I live in Germany. I work for an IT service company, which does mainly development and maintenance of IT projects. We specialise in the maintenance of e-commerce website and create enterprise applications.

Derick Rethans 0:52

How long have you been using PHP for?

Philipp Tanlak 0:54

I've been using PHP for quite a long time now that might be six years I guess.

Derick Rethans 0:58

What brought to you creating an RFC?

Philipp Tanlak 1:02

The main reason I've created this RFC was out of necessity and interest, mainly to scratch my own itch.

Derick Rethans 1:08

That is how most things make it into PHP in the end isn't it?

Philipp Tanlak 1:11

Yeah, I guess.

Derick Rethans 1:12

The RFC is titled str_contains, that tells me something that is about strings and containing things. How do we currently find a string in a string?

Philipp Tanlak 1:22

The current approach to find the string in a string is to use the strpos() function or the strstr() function. But on Reddit, I found someone also use preg_match which I find kind of interesting.

Derick Rethans 1:35

There are multiple amount of different methods in use, what are the general problems with these approaches that people have made?

Philipp Tanlak 1:41

So the current approach which I find is not very intuitive, and mainly because of the return values of these functions. For example, the strpos() returns either the position where the string is found, or a false value if the string is not found, but there has to be a check with a !== operation, and the strstr() function just returns a string. So you have to convert that to a boolean to check if the string is found or not.

Derick Rethans 2:11

Because with strpos(), if you wouldn't use the === or !== operator. Of course, if it would find it at the first position of the string, it'd be zero position, and it would return false, even though it's sfound it.

Philipp Tanlak 2:26

Yeah.

Derick Rethans 2:27

So there's a few different problems with these things. Also, I don't think it's particularly vary intuitive to do because you sort of need to come up with like a whole construct to see whether it's part of a string.

Philipp Tanlak 2:37

Correct. I don't think it's intuitive for a beginner. So if someone is learning PHP for the first time, then he has to search through the documentation, what are the exact return values for these functions, and has to remember that so I thought, string or str_contains() might be a better fit for that to just return a true or false value.

Derick Rethans 2:58

We've mentioned str_contains() a few times now, I guess the RFC is producing to add this function. How would this function differ from what PHP already has?

Philipp Tanlak 3:07

So this function does not differ in a lot of ways. It's basically the same implementation of the strpos() function. But instead of returning the position of the found string, it just simply returns it as a boolean value. So either true or false.

Derick Rethans 3:23

I can imagine some people will say, well, you can just do this in your own wrapper function, right? Because pretty much what it deos is converting the results from strpos() to a boolean. But you must have a good reason of why to want to add an extra function here.

Philipp Tanlak 3:38

The reason for this function, and maybe someone might disagree is, mainly a user experience for the developer. So this is just out of necessity which I found, and I've been using this function quite a lot. So I thought this might be a valid add to the PHP language. So I tried to implement it and it got some great reviews. So I thought that wasn't a very bad idea I had.

Derick Rethans 4:04

Is the RFC suggesting just out a single function: str_contains().

Philipp Tanlak 4:09

Yes, the RFC is currently adding just a single function, which is the str_contains(). When I first submitted the discussion about this RFC, there were quite a few people asking why is there no case insensitivity or multibyte versions for these, and I did not think of those at first. But in the discussion, it became clear that the multibyte version did not seem to be very necessary because the comparison is going to be byte by byte. Unlike strpos(), the position of the found string is not relevant. So it doesn't matter if there is any difference in encoding.

Derick Rethans 4:47

I remember in last year, there was another RFC related to strings functions they were the string_starts_with() and a string_ends_with(). Those are two functions and there were also variants for both case insensitivity, ss well as multibyte. Which made eight different functions to be added to pretty much do a single thing. That RFC failed, potentially because there are so many things being added.

Philipp Tanlak 5:11

Yeah, that was also the main reason, I think the case insensitivity of this function, or the variant of it was not so relevant. So I did not include it into the RFC just because of this case you mentioned. So instead of polluting the global space with more functions, someone suggested to just advance PHP incrementally and add in case sensitivity for this function just if it is necessary.

Derick Rethans 5:37

This is a common recurring subject. Most of the people I spoke with in the last few episodes are all adding things to PHP bit by bit instead of coming up with big RFCs which I think is a good way of going forwards. When reading the RFC, I had a quick look at which argument the function would accept. PHP of course this weakly typed strings in most of time. Is this str_contains() function handling distinct different from what strpos() does for function arguments.

Philipp Tanlak 6:10

So the str_contains() function uses the same internal function, which is php_memnstr(), if I recall correctly. It tries to interpret it as a string. And if it's not a string, it either throws a warning or notice, but I've just run some checks and it seems like in the next PHP version, non string values which are passed into the string functions will be interpreted as a string, and if that is not the case, it will throw an error or usually return false.

Derick Rethans 6:43

So it doesn't do any special magic, and just relies on the PHP tends to do for parsing arguments and weak and strict typing.

Philipp Tanlak 6:51

Yes, that's correct.

Derick Rethans 6:53

Most RFCs they come with a patch, as does yours. How did you find it getting started with writing things for PHP instead of using PHP.

Philipp Tanlak 7:02

So basically, I've looked at the PHP source code in the past, just to see how things are implemented. And I had some basic background in C. So I thought that this was not very hard for me. Most of the functions or things I had to do to include this patch, were already there. So basically, I just copied the strpos() function and remove the, when the string is found, use the position to calculate a new string and just remove that code and return the boolean value from the found position.

Derick Rethans 7:35

Because it is not a very different function from strpos(), it's just pretty much a different return type. It's a lot easier to do.

Philipp Tanlak 7:44

Yeah.

Derick Rethans 7:45

When looking at feedback, what were the main criticisms of this?

Philipp Tanlak 7:48

The main criticism of this was basically just the variants of these functions. So mainly the multibyte variant or the in case sensitivity. Other than that, the response was very, very nice and, and also very rewarding for me. So I thought I did a good job on this. And many people wanted to have this function in PHP, but either did not have the time to implement it or it was too easy. I'm not sure how that went. But I think the response from the devs and the overall PHP community was very nice.

Derick Rethans 8:23

The RFC is already in voting, so I'm I'm a bit late to talk about them. Usually I'm and things are still in discussion. And at the moment, it looks like it is passing because the votes are 43 to 6 with another weeks ago, then.

Philipp Tanlak 8:37

Yeah.

Derick Rethans 8:37

Do you think this will be your last RFC? Or do you have something else in mind?

Philipp Tanlak 8:41

At the time of this recording I don't have anything else in mind, but maybe if I find something. Since I'm working with PHP on a daily basis, which I think is worth adding to PHP I might create a new RFC.

Derick Rethans 8:54

That's how I started and see what happens now. Thank you for taking the time to talk to me today Phillipp, I hope you enjoyed this.

Philipp Tanlak 9:01

Yeah, thanks for having me Derick.

Derick Rethans 9:05

Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week.