1// Copyright (C) 2016 The Qt Company Ltd.
2// Copyright (C) 2016 Intel Corporation.
3// SPDX-License-Identifier: LicenseRef-Qt-Commercial OR LGPL-3.0-only OR GPL-2.0-only OR GPL-3.0-only
4
5/*!
6 \class QUrl
7 \inmodule QtCore
8
9 \brief The QUrl class provides a convenient interface for working
10 with URLs.
11
12 \reentrant
13 \ingroup io
14 \ingroup network
15 \ingroup shared
16
17 It can parse and construct URLs in both encoded and unencoded
18 form. QUrl also has support for internationalized domain names
19 (IDNs).
20
21 The most common way to use QUrl is to initialize it via the constructor by
22 passing a QString containing a full URL. QUrl objects can also be created
23 from a QByteArray containing a full URL using QUrl::fromEncoded(), or
24 heuristically from incomplete URLs using QUrl::fromUserInput(). The URL
25 representation can be obtained from a QUrl using either QUrl::toString() or
26 QUrl::toEncoded().
27
28 URLs can be represented in two forms: encoded or unencoded. The
29 unencoded representation is suitable for showing to users, but
30 the encoded representation is typically what you would send to
31 a web server. For example, the unencoded URL
32 "http://bühler.example.com/List of applicants.xml"
33 would be sent to the server as
34 "http://xn--bhler-kva.example.com/List%20of%20applicants.xml".
35
36 A URL can also be constructed piece by piece by calling
37 setScheme(), setUserName(), setPassword(), setHost(), setPort(),
38 setPath(), setQuery() and setFragment(). Some convenience
39 functions are also available: setAuthority() sets the user name,
40 password, host and port. setUserInfo() sets the user name and
41 password at once.
42
43 Call isValid() to check if the URL is valid. This can be done at any point
44 during the constructing of a URL. If isValid() returns \c false, you should
45 clear() the URL before proceeding, or start over by parsing a new URL with
46 setUrl().
47
48 Constructing a query is particularly convenient through the use of the \l
49 QUrlQuery class and its methods QUrlQuery::setQueryItems(),
50 QUrlQuery::addQueryItem() and QUrlQuery::removeQueryItem(). Use
51 QUrlQuery::setQueryDelimiters() to customize the delimiters used for
52 generating the query string.
53
54 For the convenience of generating encoded URL strings or query
55 strings, there are two static functions called
56 fromPercentEncoding() and toPercentEncoding() which deal with
57 percent encoding and decoding of QString objects.
58
59 fromLocalFile() constructs a QUrl by parsing a local
60 file path. toLocalFile() converts a URL to a local file path.
61
62 The human readable representation of the URL is fetched with
63 toString(). This representation is appropriate for displaying a
64 URL to a user in unencoded form. The encoded form however, as
65 returned by toEncoded(), is for internal use, passing to web
66 servers, mail clients and so on. Both forms are technically correct
67 and represent the same URL unambiguously -- in fact, passing either
68 form to QUrl's constructor or to setUrl() will yield the same QUrl
69 object.
70
71 QUrl conforms to the URI specification from
72 \l{RFC 3986} (Uniform Resource Identifier: Generic Syntax), and includes
73 scheme extensions from \l{RFC 1738} (Uniform Resource Locators). Case
74 folding rules in QUrl conform to \l{RFC 3491} (Nameprep: A Stringprep
75 Profile for Internationalized Domain Names (IDN)). It is also compatible with the
76 \l{http://freedesktop.org/wiki/Specifications/file-uri-spec/}{file URI specification}
77 from freedesktop.org, provided that the locale encodes file names using
78 UTF-8 (required by IDN).
79
80 \section2 Relative URLs vs Relative Paths
81
82 Calling isRelative() will return whether or not the URL is relative.
83 A relative URL has no \l {scheme}. For example:
84
85 \snippet code/src_corelib_io_qurl.cpp 8
86
87 Notice that a URL can be absolute while containing a relative path, and
88 vice versa:
89
90 \snippet code/src_corelib_io_qurl.cpp 9
91
92 A relative URL can be resolved by passing it as an argument to resolved(),
93 which returns an absolute URL. isParentOf() is used for determining whether
94 one URL is a parent of another.
95
96 \section2 Error checking
97
98 QUrl is capable of detecting many errors in URLs while parsing it or when
99 components of the URL are set with individual setter methods (like
100 setScheme(), setHost() or setPath()). If the parsing or setter function is
101 successful, any previously recorded error conditions will be discarded.
102
103 By default, QUrl setter methods operate in QUrl::TolerantMode, which means
104 they accept some common mistakes and mis-representation of data. An
105 alternate method of parsing is QUrl::StrictMode, which applies further
106 checks. See QUrl::ParsingMode for a description of the difference of the
107 parsing modes.
108
109 QUrl only checks for conformance with the URL specification. It does not
110 try to verify that high-level protocol URLs are in the format they are
111 expected to be by handlers elsewhere. For example, the following URIs are
112 all considered valid by QUrl, even if they do not make sense when used:
113
114 \list
115 \li "http:/filename.html"
116 \li "mailto://example.com"
117 \endlist
118
119 When the parser encounters an error, it signals the event by making
120 isValid() return false and toString() / toEncoded() return an empty string.
121 If it is necessary to show the user the reason why the URL failed to parse,
122 the error condition can be obtained from QUrl by calling errorString().
123 Note that this message is highly technical and may not make sense to
124 end-users.
125
126 QUrl is capable of recording only one error condition. If more than one
127 error is found, it is undefined which error is reported.
128
129 \section2 Character Conversions
130
131 Follow these rules to avoid erroneous character conversion when
132 dealing with URLs and strings:
133
134 \list
135 \li When creating a QString to contain a URL from a QByteArray or a
136 char*, always use QString::fromUtf8().
137 \endlist
138*/
139
140/*!
141 \enum QUrl::ParsingMode
142
143 The parsing mode controls the way QUrl parses strings.
144
145 \value TolerantMode QUrl will try to correct some common errors in URLs.
146 This mode is useful for parsing URLs coming from sources
147 not known to be strictly standards-conforming.
148
149 \value StrictMode Only valid URLs are accepted. This mode is useful for
150 general URL validation.
151
152 \value DecodedMode QUrl will interpret the URL component in the fully-decoded form,
153 where percent characters stand for themselves, not as the beginning
154 of a percent-encoded sequence. This mode is only valid for the
155 setters setting components of a URL; it is not permitted in
156 the QUrl constructor, in fromEncoded() or in setUrl().
157 For more information on this mode, see the documentation for
158 \l {QUrl::ComponentFormattingOption}{QUrl::FullyDecoded}.
159
160 In TolerantMode, the parser has the following behaviour:
161
162 \list
163
164 \li Spaces and "%20": unencoded space characters will be accepted and will
165 be treated as equivalent to "%20".
166
167 \li Single "%" characters: Any occurrences of a percent character "%" not
168 followed by exactly two hexadecimal characters (e.g., "13% coverage.html")
169 will be replaced by "%25". Note that one lone "%" character will trigger
170 the correction mode for all percent characters.
171
172 \li Reserved and unreserved characters: An encoded URL should only
173 contain a few characters as literals; all other characters should
174 be percent-encoded. In TolerantMode, these characters will be
175 accepted if they are found in the URL:
176 space / double-quote / "<" / ">" / "\" /
177 "^" / "`" / "{" / "|" / "}"
178 Those same characters can be decoded again by passing QUrl::DecodeReserved
179 to toString() or toEncoded(). In the getters of individual components,
180 those characters are often returned in decoded form.
181
182 \endlist
183
184 When in StrictMode, if a parsing error is found, isValid() will return \c
185 false and errorString() will return a message describing the error.
186 If more than one error is detected, it is undefined which error gets
187 reported.
188
189 Note that TolerantMode is not usually enough for parsing user input, which
190 often contains more errors and expectations than the parser can deal with.
191 When dealing with data coming directly from the user -- as opposed to data
192 coming from data-transfer sources, such as other programs -- it is
193 recommended to use fromUserInput().
194
195 \sa fromUserInput(), setUrl(), toString(), toEncoded(), QUrl::FormattingOptions
196*/
197
198/*!
199 \enum QUrl::UrlFormattingOption
200
201 The formatting options define how the URL is formatted when written out
202 as text.
203
204 \value None The format of the URL is unchanged.
205 \value RemoveScheme The scheme is removed from the URL.
206 \value RemovePassword Any password in the URL is removed.
207 \value RemoveUserInfo Any user information in the URL is removed.
208 \value RemovePort Any specified port is removed from the URL.
209 \value RemoveAuthority
210 \value RemovePath The URL's path is removed, leaving only the scheme,
211 host address, and port (if present).
212 \value RemoveQuery The query part of the URL (following a '?' character)
213 is removed.
214 \value RemoveFragment
215 \value RemoveFilename The filename (i.e. everything after the last '/' in the path) is removed.
216 The trailing '/' is kept, unless StripTrailingSlash is set.
217 Only valid if RemovePath is not set.
218 \value PreferLocalFile If the URL is a local file according to isLocalFile()
219 and contains no query or fragment, a local file path is returned.
220 \value StripTrailingSlash The trailing slash is removed from the path, if one is present.
221 \value NormalizePathSegments Modifies the path to remove redundant directory separators,
222 and to resolve "."s and ".."s (as far as possible). For non-local paths, adjacent
223 slashes are preserved.
224
225 Note that the case folding rules in \l{RFC 3491}{Nameprep}, which QUrl
226 conforms to, require host names to always be converted to lower case,
227 regardless of the Qt::FormattingOptions used.
228
229 The options from QUrl::ComponentFormattingOptions are also possible.
230
231 \sa QUrl::ComponentFormattingOptions
232*/
233
234/*!
235 \enum QUrl::ComponentFormattingOption
236 \since 5.0
237
238 The component formatting options define how the components of an URL will
239 be formatted when written out as text. They can be combined with the
240 options from QUrl::FormattingOptions when used in toString() and
241 toEncoded().
242
243 \value PrettyDecoded The component is returned in a "pretty form", with
244 most percent-encoded characters decoded. The exact
245 behavior of PrettyDecoded varies from component to
246 component and may also change from Qt release to Qt
247 release. This is the default.
248
249 \value EncodeSpaces Leave space characters in their encoded form ("%20").
250
251 \value EncodeUnicode Leave non-US-ASCII characters encoded in their UTF-8
252 percent-encoded form (e.g., "%C3%A9" for the U+00E9
253 codepoint, LATIN SMALL LETTER E WITH ACUTE).
254
255 \value EncodeDelimiters Leave certain delimiters in their encoded form, as
256 would appear in the URL when the full URL is
257 represented as text. The delimiters are affected
258 by this option change from component to component.
259 This flag has no effect in toString() or toEncoded().
260
261 \value EncodeReserved Leave US-ASCII characters not permitted in the URL by
262 the specification in their encoded form. This is the
263 default on toString() and toEncoded().
264
265 \value DecodeReserved Decode the US-ASCII characters that the URL specification
266 does not allow to appear in the URL. This is the
267 default on the getters of individual components.
268
269 \value FullyEncoded Leave all characters in their properly-encoded form,
270 as this component would appear as part of a URL. When
271 used with toString(), this produces a fully-compliant
272 URL in QString form, exactly equal to the result of
273 toEncoded()
274
275 \value FullyDecoded Attempt to decode as much as possible. For individual
276 components of the URL, this decodes every percent
277 encoding sequence, including control characters (U+0000
278 to U+001F) and UTF-8 sequences found in percent-encoded form.
279 Use of this mode may cause data loss, see below for more information.
280
281 The values of EncodeReserved and DecodeReserved should not be used together
282 in one call. The behavior is undefined if that happens. They are provided
283 as separate values because the behavior of the "pretty mode" with regards
284 to reserved characters is different on certain components and specially on
285 the full URL.
286
287 \section2 Full decoding
288
289 The FullyDecoded mode is similar to the behavior of the functions returning
290 QString in Qt 4.x, in that every character represents itself and never has
291 any special meaning. This is true even for the percent character ('%'),
292 which should be interpreted to mean a literal percent, not the beginning of
293 a percent-encoded sequence. The same actual character, in all other
294 decoding modes, is represented by the sequence "%25".
295
296 Whenever re-applying data obtained with QUrl::FullyDecoded into a QUrl,
297 care must be taken to use the QUrl::DecodedMode parameter to the setters
298 (like setPath() and setUserName()). Failure to do so may cause
299 re-interpretation of the percent character ('%') as the beginning of a
300 percent-encoded sequence.
301
302 This mode is quite useful when portions of a URL are used in a non-URL
303 context. For example, to extract the username, password or file paths in an
304 FTP client application, the FullyDecoded mode should be used.
305
306 This mode should be used with care, since there are two conditions that
307 cannot be reliably represented in the returned QString. They are:
308
309 \list
310 \li \b{Non-UTF-8 sequences:} URLs may contain sequences of
311 percent-encoded characters that do not form valid UTF-8 sequences. Since
312 URLs need to be decoded using UTF-8, any decoder failure will result in
313 the QString containing one or more replacement characters where the
314 sequence existed.
315
316 \li \b{Encoded delimiters:} URLs are also allowed to make a distinction
317 between a delimiter found in its literal form and its equivalent in
318 percent-encoded form. This is most commonly found in the query, but is
319 permitted in most parts of the URL.
320 \endlist
321
322 The following example illustrates the problem:
323
324 \snippet code/src_corelib_io_qurl.cpp 10
325
326 If the two URLs were used via HTTP GET, the interpretation by the web
327 server would probably be different. In the first case, it would interpret
328 as one parameter, with a key of "q" and value "a+=b&c". In the second
329 case, it would probably interpret as two parameters, one with a key of "q"
330 and value "a =b", and the second with a key "c" and no value.
331
332 \sa QUrl::FormattingOptions
333*/
334
335/*!
336 \enum QUrl::UserInputResolutionOption
337 \since 5.4
338
339 The user input resolution options define how fromUserInput() should
340 interpret strings that could either be a relative path or the short
341 form of a HTTP URL. For instance \c{file.pl} can be either a local file
342 or the URL \c{http://file.pl}.
343
344 \value DefaultResolution The default resolution mechanism is to check
345 whether a local file exists, in the working
346 directory given to fromUserInput, and only
347 return a local path in that case. Otherwise a URL
348 is assumed.
349 \value AssumeLocalFile This option makes fromUserInput() always return
350 a local path unless the input contains a scheme, such as
351 \c{http://file.pl}. This is useful for applications
352 such as text editors, which are able to create
353 the file if it doesn't exist.
354
355 \sa fromUserInput()
356*/
357
358/*!
359 \enum QUrl::AceProcessingOption
360 \since 6.3
361
362 The ACE processing options control the way URLs are transformed to and from
363 ASCII-Compatible Encoding.
364
365 \value IgnoreIDNWhitelist Ignore the IDN whitelist when converting URLs
366 to Unicode.
367 \value AceTransitionalProcessing Use transitional processing described in UTS #46.
368 This allows better compatibility with IDNA 2003
369 specification.
370
371 The default is to use nontransitional processing and to allow non-ASCII
372 characters only inside URLs whose top-level domains are listed in the IDN whitelist.
373
374 \sa toAce(), fromAce(), idnWhitelist()
375*/
376
377/*!
378 \fn QUrl::QUrl(QUrl &&other)
379
380 Move-constructs a QUrl instance, making it point at the same
381 object that \a other was pointing to.
382
383 \since 5.2
384*/
385
386/*!
387 \fn QUrl &QUrl::operator=(QUrl &&other)
388
389 Move-assigns \a other to this QUrl instance.
390
391 \since 5.2
392*/
393
394#include "qurl.h"
395#include "qurl_p.h"
396#include "qplatformdefs.h"
397#include "qstring.h"
398#include "qstringlist.h"
399#include "qdebug.h"
400#include "qhash.h"
401#include "qdatastream.h"
402#include "private/qipaddress_p.h"
403#include "qurlquery.h"
404#include "private/qdir_p.h"
405#include <private/qtools_p.h>
406
407QT_BEGIN_NAMESPACE
408
409using namespace Qt::StringLiterals;
410using namespace QtMiscUtils;
411
412inline static bool isHex(char c)
413{
414 c |= 0x20;
415 return isAsciiDigit(c) || (c >= 'a' && c <= 'f');
416}
417
418static inline QString ftpScheme()
419{
420 return QStringLiteral("ftp");
421}
422
423static inline QString fileScheme()
424{
425 return QStringLiteral("file");
426}
427
428static inline QString webDavScheme()
429{
430 return QStringLiteral("webdavs");
431}
432
433static inline QString webDavSslTag()
434{
435 return QStringLiteral("@SSL");
436}
437
438class QUrlPrivate
439{
440public:
441 enum Section : uchar {
442 Scheme = 0x01,
443 UserName = 0x02,
444 Password = 0x04,
445 UserInfo = UserName | Password,
446 Host = 0x08,
447 Port = 0x10,
448 Authority = UserInfo | Host | Port,
449 Path = 0x20,
450 Hierarchy = Authority | Path,
451 Query = 0x40,
452 Fragment = 0x80,
453 FullUrl = 0xff
454 };
455
456 enum Flags : uchar {
457 IsLocalFile = 0x01
458 };
459
460 enum ErrorCode {
461 // the high byte of the error code matches the Section
462 // the first item in each value must be the generic "Invalid xxx Error"
463 InvalidSchemeError = Scheme << 8,
464
465 InvalidUserNameError = UserName << 8,
466
467 InvalidPasswordError = Password << 8,
468
469 InvalidRegNameError = Host << 8,
470 InvalidIPv4AddressError,
471 InvalidIPv6AddressError,
472 InvalidCharacterInIPv6Error,
473 InvalidIPvFutureError,
474 HostMissingEndBracket,
475
476 InvalidPortError = Port << 8,
477 PortEmptyError,
478
479 InvalidPathError = Path << 8,
480
481 InvalidQueryError = Query << 8,
482
483 InvalidFragmentError = Fragment << 8,
484
485 // the following three cases are only possible in combination with
486 // presence/absence of the path, authority and scheme. See validityError().
487 AuthorityPresentAndPathIsRelative = Authority << 8 | Path << 8 | 0x10000,
488 AuthorityAbsentAndPathIsDoubleSlash,
489 RelativeUrlPathContainsColonBeforeSlash = Scheme << 8 | Authority << 8 | Path << 8 | 0x10000,
490
491 NoError = 0
492 };
493
494 struct Error {
495 QString source;
496 qsizetype position;
497 ErrorCode code;
498 };
499
500 QUrlPrivate();
501 QUrlPrivate(const QUrlPrivate &copy);
502 ~QUrlPrivate();
503
504 void parse(const QString &url, QUrl::ParsingMode parsingMode);
505 bool isEmpty() const
506 { return sectionIsPresent == 0 && port == -1 && path.isEmpty(); }
507
508 std::unique_ptr<Error> cloneError() const;
509 void clearError();
510 void setError(ErrorCode errorCode, const QString &source, qsizetype supplement = -1);
511 ErrorCode validityError(QString *source = nullptr, qsizetype *position = nullptr) const;
512 bool validateComponent(Section section, const QString &input, qsizetype begin, qsizetype end);
513 bool validateComponent(Section section, const QString &input)
514 { return validateComponent(section, input, begin: 0, end: input.size()); }
515
516 // no QString scheme() const;
517 void appendAuthority(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const;
518 void appendUserInfo(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const;
519 void appendUserName(QString &appendTo, QUrl::FormattingOptions options) const;
520 void appendPassword(QString &appendTo, QUrl::FormattingOptions options) const;
521 void appendHost(QString &appendTo, QUrl::FormattingOptions options) const;
522 void appendPath(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const;
523 void appendQuery(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const;
524 void appendFragment(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const;
525
526 // the "end" parameters are like STL iterators: they point to one past the last valid element
527 bool setScheme(const QString &value, qsizetype len, bool doSetError);
528 void setAuthority(const QString &auth, qsizetype from, qsizetype end, QUrl::ParsingMode mode);
529 void setUserInfo(const QString &userInfo, qsizetype from, qsizetype end);
530 void setUserName(const QString &value, qsizetype from, qsizetype end);
531 void setPassword(const QString &value, qsizetype from, qsizetype end);
532 bool setHost(const QString &value, qsizetype from, qsizetype end, QUrl::ParsingMode mode);
533 void setPath(const QString &value, qsizetype from, qsizetype end);
534 void setQuery(const QString &value, qsizetype from, qsizetype end);
535 void setFragment(const QString &value, qsizetype from, qsizetype end);
536
537 inline bool hasScheme() const { return sectionIsPresent & Scheme; }
538 inline bool hasAuthority() const { return sectionIsPresent & Authority; }
539 inline bool hasUserInfo() const { return sectionIsPresent & UserInfo; }
540 inline bool hasUserName() const { return sectionIsPresent & UserName; }
541 inline bool hasPassword() const { return sectionIsPresent & Password; }
542 inline bool hasHost() const { return sectionIsPresent & Host; }
543 inline bool hasPort() const { return port != -1; }
544 inline bool hasPath() const { return !path.isEmpty(); }
545 inline bool hasQuery() const { return sectionIsPresent & Query; }
546 inline bool hasFragment() const { return sectionIsPresent & Fragment; }
547
548 inline bool isLocalFile() const { return flags & IsLocalFile; }
549 QString toLocalFile(QUrl::FormattingOptions options) const;
550
551 QString mergePaths(const QString &relativePath) const;
552
553 QAtomicInt ref;
554 int port;
555
556 QString scheme;
557 QString userName;
558 QString password;
559 QString host;
560 QString path;
561 QString query;
562 QString fragment;
563
564 std::unique_ptr<Error> error;
565
566 // not used for:
567 // - Port (port == -1 means absence)
568 // - Path (there's no path delimiter, so we optimize its use out of existence)
569 // Schemes are never supposed to be empty, but we keep the flag anyway
570 uchar sectionIsPresent;
571 uchar flags;
572
573 // 32-bit: 2 bytes tail padding available
574 // 64-bit: 6 bytes tail padding available
575};
576
577inline QUrlPrivate::QUrlPrivate()
578 : ref(1), port(-1),
579 sectionIsPresent(0),
580 flags(0)
581{
582}
583
584inline QUrlPrivate::QUrlPrivate(const QUrlPrivate &copy)
585 : ref(1), port(copy.port),
586 scheme(copy.scheme),
587 userName(copy.userName),
588 password(copy.password),
589 host(copy.host),
590 path(copy.path),
591 query(copy.query),
592 fragment(copy.fragment),
593 error(copy.cloneError()),
594 sectionIsPresent(copy.sectionIsPresent),
595 flags(copy.flags)
596{
597}
598
599inline QUrlPrivate::~QUrlPrivate()
600 = default;
601
602std::unique_ptr<QUrlPrivate::Error> QUrlPrivate::cloneError() const
603{
604 return error ? std::make_unique<Error>(args&: *error) : nullptr;
605}
606
607inline void QUrlPrivate::clearError()
608{
609 error.reset();
610}
611
612inline void QUrlPrivate::setError(ErrorCode errorCode, const QString &source, qsizetype supplement)
613{
614 if (error) {
615 // don't overwrite an error set in a previous section during parsing
616 return;
617 }
618 error = std::make_unique<Error>();
619 error->code = errorCode;
620 error->source = source;
621 error->position = supplement;
622}
623
624// From RFC 3986, Appendix A Collected ABNF for URI
625// URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
626//[...]
627// scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
628//
629// authority = [ userinfo "@" ] host [ ":" port ]
630// userinfo = *( unreserved / pct-encoded / sub-delims / ":" )
631// host = IP-literal / IPv4address / reg-name
632// port = *DIGIT
633//[...]
634// reg-name = *( unreserved / pct-encoded / sub-delims )
635//[..]
636// pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
637//
638// query = *( pchar / "/" / "?" )
639//
640// fragment = *( pchar / "/" / "?" )
641//
642// pct-encoded = "%" HEXDIG HEXDIG
643//
644// unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
645// reserved = gen-delims / sub-delims
646// gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
647// sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
648// / "*" / "+" / "," / ";" / "="
649// the path component has a complex ABNF that basically boils down to
650// slash-separated segments of "pchar"
651
652// The above is the strict definition of the URL components and we mostly
653// adhere to it, with few exceptions. QUrl obeys the following behavior:
654// - percent-encoding sequences always use uppercase HEXDIG;
655// - unreserved characters are *always* decoded, no exceptions;
656// - the space character and bytes with the high bit set are controlled by
657// the EncodeSpaces and EncodeUnicode bits;
658// - control characters, the percent sign itself, and bytes with the high
659// bit set that don't form valid UTF-8 sequences are always encoded,
660// except in FullyDecoded mode;
661// - sub-delims are always left alone, except in FullyDecoded mode;
662// - gen-delim change behavior depending on which section of the URL (or
663// the entire URL) we're looking at; see below;
664// - characters not mentioned above, like "<", and ">", are usually
665// decoded in individual sections of the URL, but encoded when the full
666// URL is put together (we can change on subjective definition of
667// "pretty").
668//
669// The behavior for the delimiters bears some explanation. The spec says in
670// section 2.2:
671// URIs that differ in the replacement of a reserved character with its
672// corresponding percent-encoded octet are not equivalent.
673// (note: QUrl API mistakenly uses the "reserved" term, so we will refer to
674// them here as "delimiters").
675//
676// For that reason, we cannot encode delimiters found in decoded form and we
677// cannot decode the ones found in encoded form if that would change the
678// interpretation. Conversely, we *can* perform the transformation if it would
679// not change the interpretation. From the last component of a URL to the first,
680// here are the gen-delims we can unambiguously transform when the field is
681// taken in isolation:
682// - fragment: none, since it's the last
683// - query: "#" is unambiguous
684// - path: "#" and "?" are unambiguous
685// - host: completely special but never ambiguous, see setHost() below.
686// - password: the "#", "?", "/", "[", "]" and "@" characters are unambiguous
687// - username: the "#", "?", "/", "[", "]", "@", and ":" characters are unambiguous
688// - scheme: doesn't accept any delimiter, see setScheme() below.
689//
690// Internally, QUrl stores each component in the format that corresponds to the
691// default mode (PrettyDecoded). It deviates from the "strict" FullyEncoded
692// mode in the following way:
693// - spaces are decoded
694// - valid UTF-8 sequences are decoded
695// - gen-delims that can be unambiguously transformed are decoded
696// - characters controlled by DecodeReserved are often decoded, though this behavior
697// can change depending on the subjective definition of "pretty"
698//
699// Note that the list of gen-delims that we can transform is different for the
700// user info (user name + password) and the authority (user info + host +
701// port).
702
703
704// list the recoding table modifications to be used with the recodeFromUser and
705// appendToUser functions, according to the rules above. Spaces and UTF-8
706// sequences are handled outside the tables.
707
708// the encodedXXX tables are run with the delimiters set to "leave" by default;
709// the decodedXXX tables are run with the delimiters set to "decode" by default
710// (except for the query, which doesn't use these functions)
711
712namespace {
713template <typename T> constexpr ushort decode(T x) noexcept { return ushort(x); }
714template <typename T> constexpr ushort leave(T x) noexcept { return ushort(0x100 | x); }
715template <typename T> constexpr ushort encode(T x) noexcept { return ushort(0x200 | x); }
716}
717
718static const ushort userNameInIsolation[] = {
719 decode(x: ':'), // 0
720 decode(x: '@'), // 1
721 decode(x: ']'), // 2
722 decode(x: '['), // 3
723 decode(x: '/'), // 4
724 decode(x: '?'), // 5
725 decode(x: '#'), // 6
726
727 decode(x: '"'), // 7
728 decode(x: '<'),
729 decode(x: '>'),
730 decode(x: '^'),
731 decode(x: '\\'),
732 decode(x: '|'),
733 decode(x: '{'),
734 decode(x: '}'),
735 0
736};
737static const ushort * const passwordInIsolation = userNameInIsolation + 1;
738static const ushort * const pathInIsolation = userNameInIsolation + 5;
739static const ushort * const queryInIsolation = userNameInIsolation + 6;
740static const ushort * const fragmentInIsolation = userNameInIsolation + 7;
741
742static const ushort userNameInUserInfo[] = {
743 encode(x: ':'), // 0
744 decode(x: '@'), // 1
745 decode(x: ']'), // 2
746 decode(x: '['), // 3
747 decode(x: '/'), // 4
748 decode(x: '?'), // 5
749 decode(x: '#'), // 6
750
751 decode(x: '"'), // 7
752 decode(x: '<'),
753 decode(x: '>'),
754 decode(x: '^'),
755 decode(x: '\\'),
756 decode(x: '|'),
757 decode(x: '{'),
758 decode(x: '}'),
759 0
760};
761static const ushort * const passwordInUserInfo = userNameInUserInfo + 1;
762
763static const ushort userNameInAuthority[] = {
764 encode(x: ':'), // 0
765 encode(x: '@'), // 1
766 encode(x: ']'), // 2
767 encode(x: '['), // 3
768 decode(x: '/'), // 4
769 decode(x: '?'), // 5
770 decode(x: '#'), // 6
771
772 decode(x: '"'), // 7
773 decode(x: '<'),
774 decode(x: '>'),
775 decode(x: '^'),
776 decode(x: '\\'),
777 decode(x: '|'),
778 decode(x: '{'),
779 decode(x: '}'),
780 0
781};
782static const ushort * const passwordInAuthority = userNameInAuthority + 1;
783
784static const ushort userNameInUrl[] = {
785 encode(x: ':'), // 0
786 encode(x: '@'), // 1
787 encode(x: ']'), // 2
788 encode(x: '['), // 3
789 encode(x: '/'), // 4
790 encode(x: '?'), // 5
791 encode(x: '#'), // 6
792
793 // no need to list encode(x) for the other characters
794 0
795};
796static const ushort * const passwordInUrl = userNameInUrl + 1;
797static const ushort * const pathInUrl = userNameInUrl + 5;
798static const ushort * const queryInUrl = userNameInUrl + 6;
799static const ushort * const fragmentInUrl = userNameInUrl + 6;
800
801static inline void parseDecodedComponent(QString &data)
802{
803 data.replace(c: u'%', after: "%25"_L1);
804}
805
806static inline QString
807recodeFromUser(const QString &input, const ushort *actions, qsizetype from, qsizetype to)
808{
809 QString output;
810 const QChar *begin = input.constData() + from;
811 const QChar *end = input.constData() + to;
812 if (qt_urlRecode(appendTo&: output, url: QStringView{begin, end}, encoding: {}, tableModifications: actions))
813 return output;
814
815 return input.mid(position: from, n: to - from);
816}
817
818// appendXXXX functions: copy from the internal form to the external, user form.
819// the internal value is stored in its PrettyDecoded form, so that case is easy.
820static inline void appendToUser(QString &appendTo, QStringView value, QUrl::FormattingOptions options,
821 const ushort *actions)
822{
823 // The stored value is already QUrl::PrettyDecoded, so there's nothing to
824 // do if that's what the user asked for (test only
825 // ComponentFormattingOptions, ignore FormattingOptions).
826 if ((options & 0xFFFF0000) == QUrl::PrettyDecoded ||
827 !qt_urlRecode(appendTo, url: value, encoding: options, tableModifications: actions))
828 appendTo += value;
829
830 // copy nullness, if necessary, because QString::operator+=(QStringView) doesn't
831 if (appendTo.isNull() && !value.isNull())
832 appendTo.detach();
833}
834
835inline void QUrlPrivate::appendAuthority(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const
836{
837 if ((options & QUrl::RemoveUserInfo) != QUrl::RemoveUserInfo) {
838 appendUserInfo(appendTo, options, appendingTo);
839
840 // add '@' only if we added anything
841 if (hasUserName() || (hasPassword() && (options & QUrl::RemovePassword) == 0))
842 appendTo += u'@';
843 }
844 appendHost(appendTo, options);
845 if (!(options & QUrl::RemovePort) && port != -1)
846 appendTo += u':' + QString::number(port);
847}
848
849inline void QUrlPrivate::appendUserInfo(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const
850{
851 if (Q_LIKELY(!hasUserInfo()))
852 return;
853
854 const ushort *userNameActions;
855 const ushort *passwordActions;
856 if (options & QUrl::EncodeDelimiters) {
857 userNameActions = userNameInUrl;
858 passwordActions = passwordInUrl;
859 } else {
860 switch (appendingTo) {
861 case UserInfo:
862 userNameActions = userNameInUserInfo;
863 passwordActions = passwordInUserInfo;
864 break;
865
866 case Authority:
867 userNameActions = userNameInAuthority;
868 passwordActions = passwordInAuthority;
869 break;
870
871 case FullUrl:
872 userNameActions = userNameInUrl;
873 passwordActions = passwordInUrl;
874 break;
875
876 default:
877 // can't happen
878 Q_UNREACHABLE();
879 break;
880 }
881 }
882
883 if (!qt_urlRecode(appendTo, url: userName, encoding: options, tableModifications: userNameActions))
884 appendTo += userName;
885 if (options & QUrl::RemovePassword || !hasPassword()) {
886 return;
887 } else {
888 appendTo += u':';
889 if (!qt_urlRecode(appendTo, url: password, encoding: options, tableModifications: passwordActions))
890 appendTo += password;
891 }
892}
893
894inline void QUrlPrivate::appendUserName(QString &appendTo, QUrl::FormattingOptions options) const
895{
896 // only called from QUrl::userName()
897 appendToUser(appendTo, value: userName, options,
898 actions: options & QUrl::EncodeDelimiters ? userNameInUrl : userNameInIsolation);
899}
900
901inline void QUrlPrivate::appendPassword(QString &appendTo, QUrl::FormattingOptions options) const
902{
903 // only called from QUrl::password()
904 appendToUser(appendTo, value: password, options,
905 actions: options & QUrl::EncodeDelimiters ? passwordInUrl : passwordInIsolation);
906}
907
908inline void QUrlPrivate::appendPath(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const
909{
910 QString thePath = path;
911 if (options & QUrl::NormalizePathSegments) {
912 thePath = qt_normalizePathSegments(name: path, flags: isLocalFile() ? QDirPrivate::DefaultNormalization : QDirPrivate::RemotePath);
913 }
914
915 QStringView thePathView(thePath);
916 if (options & QUrl::RemoveFilename) {
917 const qsizetype slash = path.lastIndexOf(c: u'/');
918 if (slash == -1)
919 return;
920 thePathView = QStringView{path}.left(n: slash + 1);
921 }
922 // check if we need to remove trailing slashes
923 if (options & QUrl::StripTrailingSlash) {
924 while (thePathView.size() > 1 && thePathView.endsWith(c: u'/'))
925 thePathView.chop(n: 1);
926 }
927
928 appendToUser(appendTo, value: thePathView, options,
929 actions: appendingTo == FullUrl || options & QUrl::EncodeDelimiters ? pathInUrl : pathInIsolation);
930}
931
932inline void QUrlPrivate::appendFragment(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const
933{
934 appendToUser(appendTo, value: fragment, options,
935 actions: options & QUrl::EncodeDelimiters ? fragmentInUrl :
936 appendingTo == FullUrl ? nullptr : fragmentInIsolation);
937}
938
939inline void QUrlPrivate::appendQuery(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const
940{
941 appendToUser(appendTo, value: query, options,
942 actions: appendingTo == FullUrl || options & QUrl::EncodeDelimiters ? queryInUrl : queryInIsolation);
943}
944
945// setXXX functions
946
947inline bool QUrlPrivate::setScheme(const QString &value, qsizetype len, bool doSetError)
948{
949 // schemes are strictly RFC-compliant:
950 // scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
951 // we also lowercase the scheme
952
953 // schemes in URLs are not allowed to be empty, but they can be in
954 // "Relative URIs" which QUrl also supports. QUrl::setScheme does
955 // not call us with len == 0, so this can only be from parse()
956 scheme.clear();
957 if (len == 0)
958 return false;
959
960 sectionIsPresent |= Scheme;
961
962 // validate it:
963 qsizetype needsLowercasing = -1;
964 const ushort *p = reinterpret_cast<const ushort *>(value.data());
965 for (qsizetype i = 0; i < len; ++i) {
966 if (isAsciiLower(c: p[i]))
967 continue;
968 if (isAsciiUpper(c: p[i])) {
969 needsLowercasing = i;
970 continue;
971 }
972 if (i) {
973 if (isAsciiDigit(c: p[i]))
974 continue;
975 if (p[i] == '+' || p[i] == '-' || p[i] == '.')
976 continue;
977 }
978
979 // found something else
980 // don't call setError needlessly:
981 // if we've been called from parse(), it will try to recover
982 if (doSetError)
983 setError(errorCode: InvalidSchemeError, source: value, supplement: i);
984 return false;
985 }
986
987 scheme = value.left(n: len);
988
989 if (needsLowercasing != -1) {
990 // schemes are ASCII only, so we don't need the full Unicode toLower
991 QChar *schemeData = scheme.data(); // force detaching here
992 for (qsizetype i = needsLowercasing; i >= 0; --i) {
993 ushort c = schemeData[i].unicode();
994 if (isAsciiUpper(c))
995 schemeData[i] = QChar(c + 0x20);
996 }
997 }
998
999 // did we set to the file protocol?
1000 if (scheme == fileScheme()
1001#ifdef Q_OS_WIN
1002 || scheme == webDavScheme()
1003#endif
1004 ) {
1005 flags |= IsLocalFile;
1006 } else {
1007 flags &= ~IsLocalFile;
1008 }
1009 return true;
1010}
1011
1012inline void QUrlPrivate::setAuthority(const QString &auth, qsizetype from, qsizetype end, QUrl::ParsingMode mode)
1013{
1014 sectionIsPresent &= ~Authority;
1015 sectionIsPresent |= Host;
1016 port = -1;
1017
1018 // we never actually _loop_
1019 while (from != end) {
1020 qsizetype userInfoIndex = auth.indexOf(c: u'@', from);
1021 if (size_t(userInfoIndex) < size_t(end)) {
1022 setUserInfo(userInfo: auth, from, end: userInfoIndex);
1023 if (mode == QUrl::StrictMode && !validateComponent(section: UserInfo, input: auth, begin: from, end: userInfoIndex))
1024 break;
1025 from = userInfoIndex + 1;
1026 }
1027
1028 qsizetype colonIndex = auth.lastIndexOf(c: u':', from: end - 1);
1029 if (colonIndex < from)
1030 colonIndex = -1;
1031
1032 if (size_t(colonIndex) < size_t(end)) {
1033 if (auth.at(i: from).unicode() == '[') {
1034 // check if colonIndex isn't inside the "[...]" part
1035 qsizetype closingBracket = auth.indexOf(c: u']', from);
1036 if (size_t(closingBracket) > size_t(colonIndex))
1037 colonIndex = -1;
1038 }
1039 }
1040
1041 if (size_t(colonIndex) < size_t(end) - 1) {
1042 // found a colon with digits after it
1043 unsigned long x = 0;
1044 for (qsizetype i = colonIndex + 1; i < end; ++i) {
1045 ushort c = auth.at(i).unicode();
1046 if (isAsciiDigit(c)) {
1047 x *= 10;
1048 x += c - '0';
1049 } else {
1050 x = ulong(-1); // x != ushort(x)
1051 break;
1052 }
1053 }
1054 if (x == ushort(x)) {
1055 port = ushort(x);
1056 } else {
1057 setError(errorCode: InvalidPortError, source: auth, supplement: colonIndex + 1);
1058 if (mode == QUrl::StrictMode)
1059 break;
1060 }
1061 }
1062
1063 setHost(value: auth, from, end: qMin<size_t>(a: end, b: colonIndex), mode);
1064 if (mode == QUrl::StrictMode && !validateComponent(section: Host, input: auth, begin: from, end: qMin<size_t>(a: end, b: colonIndex))) {
1065 // clear host too
1066 sectionIsPresent &= ~Authority;
1067 break;
1068 }
1069
1070 // success
1071 return;
1072 }
1073 // clear all sections but host
1074 sectionIsPresent &= ~Authority | Host;
1075 userName.clear();
1076 password.clear();
1077 host.clear();
1078 port = -1;
1079}
1080
1081inline void QUrlPrivate::setUserInfo(const QString &userInfo, qsizetype from, qsizetype end)
1082{
1083 qsizetype delimIndex = userInfo.indexOf(c: u':', from);
1084 setUserName(value: userInfo, from, end: qMin<size_t>(a: delimIndex, b: end));
1085
1086 if (size_t(delimIndex) >= size_t(end)) {
1087 password.clear();
1088 sectionIsPresent &= ~Password;
1089 } else {
1090 setPassword(value: userInfo, from: delimIndex + 1, end);
1091 }
1092}
1093
1094inline void QUrlPrivate::setUserName(const QString &value, qsizetype from, qsizetype end)
1095{
1096 sectionIsPresent |= UserName;
1097 userName = recodeFromUser(input: value, actions: userNameInIsolation, from, to: end);
1098}
1099
1100inline void QUrlPrivate::setPassword(const QString &value, qsizetype from, qsizetype end)
1101{
1102 sectionIsPresent |= Password;
1103 password = recodeFromUser(input: value, actions: passwordInIsolation, from, to: end);
1104}
1105
1106inline void QUrlPrivate::setPath(const QString &value, qsizetype from, qsizetype end)
1107{
1108 // sectionIsPresent |= Path; // not used, save some cycles
1109 path = recodeFromUser(input: value, actions: pathInIsolation, from, to: end);
1110}
1111
1112inline void QUrlPrivate::setFragment(const QString &value, qsizetype from, qsizetype end)
1113{
1114 sectionIsPresent |= Fragment;
1115 fragment = recodeFromUser(input: value, actions: fragmentInIsolation, from, to: end);
1116}
1117
1118inline void QUrlPrivate::setQuery(const QString &value, qsizetype from, qsizetype iend)
1119{
1120 sectionIsPresent |= Query;
1121 query = recodeFromUser(input: value, actions: queryInIsolation, from, to: iend);
1122}
1123
1124// Host handling
1125// The RFC says the host is:
1126// host = IP-literal / IPv4address / reg-name
1127// IP-literal = "[" ( IPv6address / IPvFuture ) "]"
1128// IPvFuture = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )
1129// [a strict definition of IPv6Address and IPv4Address]
1130// reg-name = *( unreserved / pct-encoded / sub-delims )
1131//
1132// We deviate from the standard in all but IPvFuture. For IPvFuture we accept
1133// and store only exactly what the RFC says we should. No percent-encoding is
1134// permitted in this field, so Unicode characters and space aren't either.
1135//
1136// For IPv4 addresses, we accept broken addresses like inet_aton does (that is,
1137// less than three dots). However, we correct the address to the proper form
1138// and store the corrected address. After correction, we comply to the RFC and
1139// it's exclusively composed of unreserved characters.
1140//
1141// For IPv6 addresses, we accept addresses including trailing (embedded) IPv4
1142// addresses, the so-called v4-compat and v4-mapped addresses. We also store
1143// those addresses like that in the hostname field, which violates the spec.
1144// IPv6 hosts are stored with the square brackets in the QString. It also
1145// requires no transformation in any way.
1146//
1147// As for registered names, it's the other way around: we accept only valid
1148// hostnames as specified by STD 3 and IDNA. That means everything we accept is
1149// valid in the RFC definition above, but there are many valid reg-names
1150// according to the RFC that we do not accept in the name of security. Since we
1151// do accept IDNA, reg-names are subject to ACE encoding and decoding, which is
1152// specified by the DecodeUnicode flag. The hostname is stored in its Unicode form.
1153
1154inline void QUrlPrivate::appendHost(QString &appendTo, QUrl::FormattingOptions options) const
1155{
1156 if (host.isEmpty())
1157 return;
1158 if (host.at(i: 0).unicode() == '[') {
1159 // IPv6 addresses might contain a zone-id which needs to be recoded
1160 if (options != 0)
1161 if (qt_urlRecode(appendTo, url: host, encoding: options, tableModifications: nullptr))
1162 return;
1163 appendTo += host;
1164 } else {
1165 // this is either an IPv4Address or a reg-name
1166 // if it is a reg-name, it is already stored in Unicode form
1167 if (options & QUrl::EncodeUnicode && !(options & 0x4000000))
1168 appendTo += qt_ACE_do(domain: host, op: ToAceOnly, dot: AllowLeadingDot, options: {});
1169 else
1170 appendTo += host;
1171 }
1172}
1173
1174// the whole IPvFuture is passed and parsed here, including brackets;
1175// returns null if the parsing was successful, or the QChar of the first failure
1176static const QChar *parseIpFuture(QString &host, const QChar *begin, const QChar *end, QUrl::ParsingMode mode)
1177{
1178 // IPvFuture = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )
1179 static const char acceptable[] =
1180 "!$&'()*+,;=" // sub-delims
1181 ":" // ":"
1182 "-._~"; // unreserved
1183
1184 // the brackets and the "v" have been checked
1185 const QChar *const origBegin = begin;
1186 if (begin[3].unicode() != '.')
1187 return &begin[3];
1188 if (isHexDigit(c: begin[2].unicode())) {
1189 // this is so unlikely that we'll just go down the slow path
1190 // decode the whole string, skipping the "[vH." and "]" which we already know to be there
1191 host += QStringView(begin, 4);
1192
1193 // uppercase the version, if necessary
1194 if (begin[2].unicode() >= 'a')
1195 host[host.size() - 2] = QChar{begin[2].unicode() - 0x20};
1196
1197 begin += 4;
1198 --end;
1199
1200 QString decoded;
1201 if (mode == QUrl::TolerantMode && qt_urlRecode(appendTo&: decoded, url: QStringView{begin, end}, encoding: QUrl::FullyDecoded, tableModifications: nullptr)) {
1202 begin = decoded.constBegin();
1203 end = decoded.constEnd();
1204 }
1205
1206 for ( ; begin != end; ++begin) {
1207 if (isAsciiLetterOrNumber(c: begin->unicode()))
1208 host += *begin;
1209 else if (begin->unicode() < 0x80 && strchr(s: acceptable, c: begin->unicode()) != nullptr)
1210 host += *begin;
1211 else
1212 return decoded.isEmpty() ? begin : &origBegin[2];
1213 }
1214 host += u']';
1215 return nullptr;
1216 }
1217 return &origBegin[2];
1218}
1219
1220// ONLY the IPv6 address is parsed here, WITHOUT the brackets
1221static const QChar *parseIp6(QString &host, const QChar *begin, const QChar *end, QUrl::ParsingMode mode)
1222{
1223 QStringView decoded(begin, end);
1224 QString decodedBuffer;
1225 if (mode == QUrl::TolerantMode) {
1226 // this struct is kept in automatic storage because it's only 4 bytes
1227 const ushort decodeColon[] = { decode(x: ':'), 0 };
1228 if (qt_urlRecode(appendTo&: decodedBuffer, url: decoded, encoding: QUrl::ComponentFormattingOption::PrettyDecoded, tableModifications: decodeColon))
1229 decoded = decodedBuffer;
1230 }
1231
1232 const QStringView zoneIdIdentifier(u"%25");
1233 QIPAddressUtils::IPv6Address address;
1234 QStringView zoneId;
1235
1236 qsizetype zoneIdPosition = decoded.indexOf(s: zoneIdIdentifier);
1237 if ((zoneIdPosition != -1) && (decoded.lastIndexOf(s: zoneIdIdentifier) == zoneIdPosition)) {
1238 zoneId = decoded.mid(pos: zoneIdPosition + zoneIdIdentifier.size());
1239 decoded.truncate(n: zoneIdPosition);
1240
1241 // was there anything after the zone ID separator?
1242 if (zoneId.isEmpty())
1243 return end;
1244 }
1245
1246 // did the address become empty after removing the zone ID?
1247 // (it might have always been empty)
1248 if (decoded.isEmpty())
1249 return end;
1250
1251 const QChar *ret = QIPAddressUtils::parseIp6(address, begin: decoded.constBegin(), end: decoded.constEnd());
1252 if (ret)
1253 return begin + (ret - decoded.constBegin());
1254
1255 host.reserve(asize: host.size() + (end - begin) + 2); // +2 for the brackets
1256 host += u'[';
1257 QIPAddressUtils::toString(appendTo&: host, address);
1258
1259 if (!zoneId.isEmpty()) {
1260 host += zoneIdIdentifier;
1261 host += zoneId;
1262 }
1263 host += u']';
1264 return nullptr;
1265}
1266
1267inline bool
1268QUrlPrivate::setHost(const QString &value, qsizetype from, qsizetype iend, QUrl::ParsingMode mode)
1269{
1270 const QChar *begin = value.constData() + from;
1271 const QChar *end = value.constData() + iend;
1272
1273 const qsizetype len = end - begin;
1274 host.clear();
1275 sectionIsPresent |= Host;
1276 if (len == 0)
1277 return true;
1278
1279 if (begin[0].unicode() == '[') {
1280 // IPv6Address or IPvFuture
1281 // smallest IPv6 address is "[::]" (len = 4)
1282 // smallest IPvFuture address is "[v7.X]" (len = 6)
1283 if (end[-1].unicode() != ']') {
1284 setError(errorCode: HostMissingEndBracket, source: value);
1285 return false;
1286 }
1287
1288 if (len > 5 && begin[1].unicode() == 'v') {
1289 const QChar *c = parseIpFuture(host, begin, end, mode);
1290 if (c)
1291 setError(errorCode: InvalidIPvFutureError, source: value, supplement: c - value.constData());
1292 return !c;
1293 } else if (begin[1].unicode() == 'v') {
1294 setError(errorCode: InvalidIPvFutureError, source: value, supplement: from);
1295 }
1296
1297 const QChar *c = parseIp6(host, begin: begin + 1, end: end - 1, mode);
1298 if (!c)
1299 return true;
1300
1301 if (c == end - 1)
1302 setError(errorCode: InvalidIPv6AddressError, source: value, supplement: from);
1303 else
1304 setError(errorCode: InvalidCharacterInIPv6Error, source: value, supplement: c - value.constData());
1305 return false;
1306 }
1307
1308 // check if it's an IPv4 address
1309 QIPAddressUtils::IPv4Address ip4;
1310 if (QIPAddressUtils::parseIp4(address&: ip4, begin, end)) {
1311 // yes, it was
1312 QIPAddressUtils::toString(appendTo&: host, address: ip4);
1313 return true;
1314 }
1315
1316 // This is probably a reg-name.
1317 // But it can also be an encoded string that, when decoded becomes one
1318 // of the types above.
1319 //
1320 // Two types of encoding are possible:
1321 // percent encoding (e.g., "%31%30%2E%30%2E%30%2E%31" -> "10.0.0.1")
1322 // Unicode encoding (some non-ASCII characters case-fold to digits
1323 // when nameprepping is done)
1324 //
1325 // The qt_ACE_do function below does IDNA normalization and the STD3 check.
1326 // That means a Unicode string may become an IPv4 address, but it cannot
1327 // produce a '[' or a '%'.
1328
1329 // check for percent-encoding first
1330 QString s;
1331 if (mode == QUrl::TolerantMode && qt_urlRecode(appendTo&: s, url: QStringView{begin, end}, encoding: { }, tableModifications: nullptr)) {
1332 // something was decoded
1333 // anything encoded left?
1334 qsizetype pos = s.indexOf(c: QChar(0x25)); // '%'
1335 if (pos != -1) {
1336 setError(errorCode: InvalidRegNameError, source: s, supplement: pos);
1337 return false;
1338 }
1339
1340 // recurse
1341 return setHost(value: s, from: 0, iend: s.size(), mode: QUrl::StrictMode);
1342 }
1343
1344 s = qt_ACE_do(domain: value.mid(position: from, n: iend - from), op: NormalizeAce, dot: ForbidLeadingDot, options: {});
1345 if (s.isEmpty()) {
1346 setError(errorCode: InvalidRegNameError, source: value);
1347 return false;
1348 }
1349
1350 // check IPv4 again
1351 if (QIPAddressUtils::parseIp4(address&: ip4, begin: s.constBegin(), end: s.constEnd())) {
1352 QIPAddressUtils::toString(appendTo&: host, address: ip4);
1353 } else {
1354 host = s;
1355 }
1356 return true;
1357}
1358
1359inline void QUrlPrivate::parse(const QString &url, QUrl::ParsingMode parsingMode)
1360{
1361 // URI-reference = URI / relative-ref
1362 // URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
1363 // relative-ref = relative-part [ "?" query ] [ "#" fragment ]
1364 // hier-part = "//" authority path-abempty
1365 // / other path types
1366 // relative-part = "//" authority path-abempty
1367 // / other path types here
1368
1369 sectionIsPresent = 0;
1370 flags = 0;
1371 clearError();
1372
1373 // find the important delimiters
1374 qsizetype colon = -1;
1375 qsizetype question = -1;
1376 qsizetype hash = -1;
1377 const qsizetype len = url.size();
1378 const QChar *const begin = url.constData();
1379 const ushort *const data = reinterpret_cast<const ushort *>(begin);
1380
1381 for (qsizetype i = 0; i < len; ++i) {
1382 size_t uc = data[i];
1383 if (uc == '#' && hash == -1) {
1384 hash = i;
1385
1386 // nothing more to be found
1387 break;
1388 }
1389
1390 if (question == -1) {
1391 if (uc == ':' && colon == -1)
1392 colon = i;
1393 else if (uc == '?')
1394 question = i;
1395 }
1396 }
1397
1398 // check if we have a scheme
1399 qsizetype hierStart;
1400 if (colon != -1 && setScheme(value: url, len: colon, /* don't set error */ doSetError: false)) {
1401 hierStart = colon + 1;
1402 } else {
1403 // recover from a failed scheme: it might not have been a scheme at all
1404 scheme.clear();
1405 sectionIsPresent = 0;
1406 hierStart = 0;
1407 }
1408
1409 qsizetype pathStart;
1410 qsizetype hierEnd = qMin<size_t>(a: qMin<size_t>(a: question, b: hash), b: len);
1411 if (hierEnd - hierStart >= 2 && data[hierStart] == '/' && data[hierStart + 1] == '/') {
1412 // we have an authority, it ends at the first slash after these
1413 qsizetype authorityEnd = hierEnd;
1414 for (qsizetype i = hierStart + 2; i < authorityEnd ; ++i) {
1415 if (data[i] == '/') {
1416 authorityEnd = i;
1417 break;
1418 }
1419 }
1420
1421 setAuthority(auth: url, from: hierStart + 2, end: authorityEnd, mode: parsingMode);
1422
1423 // even if we failed to set the authority properly, let's try to recover
1424 pathStart = authorityEnd;
1425 setPath(value: url, from: pathStart, end: hierEnd);
1426 } else {
1427 userName.clear();
1428 password.clear();
1429 host.clear();
1430 port = -1;
1431 pathStart = hierStart;
1432
1433 if (hierStart < hierEnd)
1434 setPath(value: url, from: hierStart, end: hierEnd);
1435 else
1436 path.clear();
1437 }
1438
1439 if (size_t(question) < size_t(hash))
1440 setQuery(value: url, from: question + 1, iend: qMin<size_t>(a: hash, b: len));
1441
1442 if (hash != -1)
1443 setFragment(value: url, from: hash + 1, end: len);
1444
1445 if (error || parsingMode == QUrl::TolerantMode)
1446 return;
1447
1448 // The parsing so far was partially tolerant of errors, except for the
1449 // scheme parser (which is always strict) and the authority (which was
1450 // executed in strict mode).
1451 // If we haven't found any errors so far, continue the strict-mode parsing
1452 // from the path component onwards.
1453
1454 if (!validateComponent(section: Path, input: url, begin: pathStart, end: hierEnd))
1455 return;
1456 if (size_t(question) < size_t(hash) && !validateComponent(section: Query, input: url, begin: question + 1, end: qMin<size_t>(a: hash, b: len)))
1457 return;
1458 if (hash != -1)
1459 validateComponent(section: Fragment, input: url, begin: hash + 1, end: len);
1460}
1461
1462QString QUrlPrivate::toLocalFile(QUrl::FormattingOptions options) const
1463{
1464 QString tmp;
1465 QString ourPath;
1466 appendPath(appendTo&: ourPath, options, appendingTo: QUrlPrivate::Path);
1467
1468 // magic for shared drive on windows
1469 if (!host.isEmpty()) {
1470 tmp = "//"_L1 + host;
1471#ifdef Q_OS_WIN // QTBUG-42346, WebDAV is visible as local file on Windows only.
1472 if (scheme == webDavScheme())
1473 tmp += webDavSslTag();
1474#endif
1475 if (!ourPath.isEmpty() && !ourPath.startsWith(c: u'/'))
1476 tmp += u'/';
1477 tmp += ourPath;
1478 } else {
1479 tmp = ourPath;
1480#ifdef Q_OS_WIN
1481 // magic for drives on windows
1482 if (ourPath.length() > 2 && ourPath.at(0) == u'/' && ourPath.at(2) == u':')
1483 tmp.remove(0, 1);
1484#endif
1485 }
1486 return tmp;
1487}
1488
1489/*
1490 From http://www.ietf.org/rfc/rfc3986.txt, 5.2.3: Merge paths
1491
1492 Returns a merge of the current path with the relative path passed
1493 as argument.
1494
1495 Note: \a relativePath is relative (does not start with '/').
1496*/
1497inline QString QUrlPrivate::mergePaths(const QString &relativePath) const
1498{
1499 // If the base URI has a defined authority component and an empty
1500 // path, then return a string consisting of "/" concatenated with
1501 // the reference's path; otherwise,
1502 if (!host.isEmpty() && path.isEmpty())
1503 return u'/' + relativePath;
1504
1505 // Return a string consisting of the reference's path component
1506 // appended to all but the last segment of the base URI's path
1507 // (i.e., excluding any characters after the right-most "/" in the
1508 // base URI path, or excluding the entire base URI path if it does
1509 // not contain any "/" characters).
1510 QString newPath;
1511 if (!path.contains(c: u'/'))
1512 newPath = relativePath;
1513 else
1514 newPath = QStringView{path}.left(n: path.lastIndexOf(c: u'/') + 1) + relativePath;
1515
1516 return newPath;
1517}
1518
1519/*
1520 From http://www.ietf.org/rfc/rfc3986.txt, 5.2.4: Remove dot segments
1521
1522 Removes unnecessary ../ and ./ from the path. Used for normalizing
1523 the URL.
1524*/
1525static void removeDotsFromPath(QString *path)
1526{
1527 // The input buffer is initialized with the now-appended path
1528 // components and the output buffer is initialized to the empty
1529 // string.
1530 QChar *out = path->data();
1531 const QChar *in = out;
1532 const QChar *end = out + path->size();
1533
1534 // If the input buffer consists only of
1535 // "." or "..", then remove that from the input
1536 // buffer;
1537 if (path->size() == 1 && in[0].unicode() == '.')
1538 ++in;
1539 else if (path->size() == 2 && in[0].unicode() == '.' && in[1].unicode() == '.')
1540 in += 2;
1541 // While the input buffer is not empty, loop:
1542 while (in < end) {
1543
1544 // otherwise, if the input buffer begins with a prefix of "../" or "./",
1545 // then remove that prefix from the input buffer;
1546 if (path->size() >= 2 && in[0].unicode() == '.' && in[1].unicode() == '/')
1547 in += 2;
1548 else if (path->size() >= 3 && in[0].unicode() == '.'
1549 && in[1].unicode() == '.' && in[2].unicode() == '/')
1550 in += 3;
1551
1552 // otherwise, if the input buffer begins with a prefix of
1553 // "/./" or "/.", where "." is a complete path segment,
1554 // then replace that prefix with "/" in the input buffer;
1555 if (in <= end - 3 && in[0].unicode() == '/' && in[1].unicode() == '.'
1556 && in[2].unicode() == '/') {
1557 in += 2;
1558 continue;
1559 } else if (in == end - 2 && in[0].unicode() == '/' && in[1].unicode() == '.') {
1560 *out++ = u'/';
1561 in += 2;
1562 break;
1563 }
1564
1565 // otherwise, if the input buffer begins with a prefix
1566 // of "/../" or "/..", where ".." is a complete path
1567 // segment, then replace that prefix with "/" in the
1568 // input buffer and remove the last //segment and its
1569 // preceding "/" (if any) from the output buffer;
1570 if (in <= end - 4 && in[0].unicode() == '/' && in[1].unicode() == '.'
1571 && in[2].unicode() == '.' && in[3].unicode() == '/') {
1572 while (out > path->constData() && (--out)->unicode() != '/')
1573 ;
1574 if (out == path->constData() && out->unicode() != '/')
1575 ++in;
1576 in += 3;
1577 continue;
1578 } else if (in == end - 3 && in[0].unicode() == '/' && in[1].unicode() == '.'
1579 && in[2].unicode() == '.') {
1580 while (out > path->constData() && (--out)->unicode() != '/')
1581 ;
1582 if (out->unicode() == '/')
1583 ++out;
1584 in += 3;
1585 break;
1586 }
1587
1588 // otherwise move the first path segment in
1589 // the input buffer to the end of the output
1590 // buffer, including the initial "/" character
1591 // (if any) and any subsequent characters up
1592 // to, but not including, the next "/"
1593 // character or the end of the input buffer.
1594 *out++ = *in++;
1595 while (in < end && in->unicode() != '/')
1596 *out++ = *in++;
1597 }
1598 path->truncate(pos: out - path->constData());
1599}
1600
1601inline QUrlPrivate::ErrorCode QUrlPrivate::validityError(QString *source, qsizetype *position) const
1602{
1603 Q_ASSERT(!source == !position);
1604 if (error) {
1605 if (source) {
1606 *source = error->source;
1607 *position = error->position;
1608 }
1609 return error->code;
1610 }
1611
1612 // There are three more cases of invalid URLs that QUrl recognizes and they
1613 // are only possible with constructed URLs (setXXX methods), not with
1614 // parsing. Therefore, they are tested here.
1615 //
1616 // Two cases are a non-empty path that doesn't start with a slash and:
1617 // - with an authority
1618 // - without an authority, without scheme but the path with a colon before
1619 // the first slash
1620 // The third case is an empty authority and a non-empty path that starts
1621 // with "//".
1622 // Those cases are considered invalid because toString() would produce a URL
1623 // that wouldn't be parsed back to the same QUrl.
1624
1625 if (path.isEmpty())
1626 return NoError;
1627 if (path.at(i: 0) == u'/') {
1628 if (hasAuthority() || path.size() == 1 || path.at(i: 1) != u'/')
1629 return NoError;
1630 if (source) {
1631 *source = path;
1632 *position = 0;
1633 }
1634 return AuthorityAbsentAndPathIsDoubleSlash;
1635 }
1636
1637 if (sectionIsPresent & QUrlPrivate::Host) {
1638 if (source) {
1639 *source = path;
1640 *position = 0;
1641 }
1642 return AuthorityPresentAndPathIsRelative;
1643 }
1644 if (sectionIsPresent & QUrlPrivate::Scheme)
1645 return NoError;
1646
1647 // check for a path of "text:text/"
1648 for (qsizetype i = 0; i < path.size(); ++i) {
1649 ushort c = path.at(i).unicode();
1650 if (c == '/') {
1651 // found the slash before the colon
1652 return NoError;
1653 }
1654 if (c == ':') {
1655 // found the colon before the slash, it's invalid
1656 if (source) {
1657 *source = path;
1658 *position = i;
1659 }
1660 return RelativeUrlPathContainsColonBeforeSlash;
1661 }
1662 }
1663 return NoError;
1664}
1665
1666bool QUrlPrivate::validateComponent(QUrlPrivate::Section section, const QString &input,
1667 qsizetype begin, qsizetype end)
1668{
1669 // What we need to look out for, that the regular parser tolerates:
1670 // - percent signs not followed by two hex digits
1671 // - forbidden characters, which should always appear encoded
1672 // '"' / '<' / '>' / '\' / '^' / '`' / '{' / '|' / '}' / BKSP
1673 // control characters
1674 // - delimiters not allowed in certain positions
1675 // . scheme: parser is already strict
1676 // . user info: gen-delims except ":" disallowed ("/" / "?" / "#" / "[" / "]" / "@")
1677 // . host: parser is stricter than the standard
1678 // . port: parser is stricter than the standard
1679 // . path: all delimiters allowed
1680 // . fragment: all delimiters allowed
1681 // . query: all delimiters allowed
1682 static const char forbidden[] = "\"<>\\^`{|}\x7F";
1683 static const char forbiddenUserInfo[] = ":/?#[]@";
1684
1685 Q_ASSERT(section != Authority && section != Hierarchy && section != FullUrl);
1686
1687 const ushort *const data = reinterpret_cast<const ushort *>(input.constData());
1688 for (size_t i = size_t(begin); i < size_t(end); ++i) {
1689 uint uc = data[i];
1690 if (uc >= 0x80)
1691 continue;
1692
1693 bool error = false;
1694 if ((uc == '%' && (size_t(end) < i + 2 || !isHex(c: data[i + 1]) || !isHex(c: data[i + 2])))
1695 || uc <= 0x20 || strchr(s: forbidden, c: uc)) {
1696 // found an error
1697 error = true;
1698 } else if (section & UserInfo) {
1699 if (section == UserInfo && strchr(s: forbiddenUserInfo + 1, c: uc))
1700 error = true;
1701 else if (section != UserInfo && strchr(s: forbiddenUserInfo, c: uc))
1702 error = true;
1703 }
1704
1705 if (!error)
1706 continue;
1707
1708 ErrorCode errorCode = ErrorCode(int(section) << 8);
1709 if (section == UserInfo) {
1710 // is it the user name or the password?
1711 errorCode = InvalidUserNameError;
1712 for (size_t j = size_t(begin); j < i; ++j)
1713 if (data[j] == ':') {
1714 errorCode = InvalidPasswordError;
1715 break;
1716 }
1717 }
1718
1719 setError(errorCode, source: input, supplement: i);
1720 return false;
1721 }
1722
1723 // no errors
1724 return true;
1725}
1726
1727#if 0
1728inline void QUrlPrivate::validate() const
1729{
1730 QUrlPrivate *that = (QUrlPrivate *)this;
1731 that->encodedOriginal = that->toEncoded(); // may detach
1732 parse(ParseOnly);
1733
1734 QURL_SETFLAG(that->stateFlags, Validated);
1735
1736 if (!isValid)
1737 return;
1738
1739 QString auth = authority(); // causes the non-encoded forms to be valid
1740
1741 // authority() calls canonicalHost() which sets this
1742 if (!isHostValid)
1743 return;
1744
1745 if (scheme == "mailto"_L1) {
1746 if (!host.isEmpty() || port != -1 || !userName.isEmpty() || !password.isEmpty()) {
1747 that->isValid = false;
1748 that->errorInfo.setParams(0, QT_TRANSLATE_NOOP(QUrl, "expected empty host, username,"
1749 "port and password"),
1750 0, 0);
1751 }
1752 } else if (scheme == ftpScheme() || scheme == httpScheme()) {
1753 if (host.isEmpty() && !(path.isEmpty() && encodedPath.isEmpty())) {
1754 that->isValid = false;
1755 that->errorInfo.setParams(0, QT_TRANSLATE_NOOP(QUrl, "the host is empty, but not the path"),
1756 0, 0);
1757 }
1758 }
1759}
1760#endif
1761
1762/*!
1763 \macro QT_NO_URL_CAST_FROM_STRING
1764 \relates QUrl
1765
1766 Disables automatic conversions from QString (or char *) to QUrl.
1767
1768 Compiling your code with this define is useful when you have a lot of
1769 code that uses QString for file names and you wish to convert it to
1770 use QUrl for network transparency. In any code that uses QUrl, it can
1771 help avoid missing QUrl::resolved() calls, and other misuses of
1772 QString to QUrl conversions.
1773
1774 For example, if you have code like
1775
1776 \code
1777 url = filename; // probably not what you want
1778 \endcode
1779
1780 you can rewrite it as
1781
1782 \code
1783 url = QUrl::fromLocalFile(filename);
1784 url = baseurl.resolved(QUrl(filename));
1785 \endcode
1786
1787 \sa QT_NO_CAST_FROM_ASCII
1788*/
1789
1790
1791/*!
1792 Constructs a URL by parsing \a url. Note this constructor expects a proper
1793 URL or URL-Reference and will not attempt to guess intent. For example, the
1794 following declaration:
1795
1796 \snippet code/src_corelib_io_qurl.cpp constructor-url-reference
1797
1798 Will construct a valid URL but it may not be what one expects, as the
1799 scheme() part of the input is missing. For a string like the above,
1800 applications may want to use fromUserInput(). For this constructor or
1801 setUrl(), the following is probably what was intended:
1802
1803 \snippet code/src_corelib_io_qurl.cpp constructor-url
1804
1805 QUrl will automatically percent encode
1806 all characters that are not allowed in a URL and decode the percent-encoded
1807 sequences that represent an unreserved character (letters, digits, hyphens,
1808 underscores, dots and tildes). All other characters are left in their
1809 original forms.
1810
1811 Parses the \a url using the parser mode \a parsingMode. In TolerantMode
1812 (the default), QUrl will correct certain mistakes, notably the presence of
1813 a percent character ('%') not followed by two hexadecimal digits, and it
1814 will accept any character in any position. In StrictMode, encoding mistakes
1815 will not be tolerated and QUrl will also check that certain forbidden
1816 characters are not present in unencoded form. If an error is detected in
1817 StrictMode, isValid() will return false. The parsing mode DecodedMode is not
1818 permitted in this context.
1819
1820 Example:
1821
1822 \snippet code/src_corelib_io_qurl.cpp 0
1823
1824 To construct a URL from an encoded string, you can also use fromEncoded():
1825
1826 \snippet code/src_corelib_io_qurl.cpp 1
1827
1828 Both functions are equivalent and, in Qt 5, both functions accept encoded
1829 data. Usually, the choice of the QUrl constructor or setUrl() versus
1830 fromEncoded() will depend on the source data: the constructor and setUrl()
1831 take a QString, whereas fromEncoded takes a QByteArray.
1832
1833 \sa setUrl(), fromEncoded(), TolerantMode
1834*/
1835QUrl::QUrl(const QString &url, ParsingMode parsingMode) : d(nullptr)
1836{
1837 setUrl(url, mode: parsingMode);
1838}
1839
1840/*!
1841 Constructs an empty QUrl object.
1842*/
1843QUrl::QUrl() : d(nullptr)
1844{
1845}
1846
1847/*!
1848 Constructs a copy of \a other.
1849*/
1850QUrl::QUrl(const QUrl &other) noexcept : d(other.d)
1851{
1852 if (d)
1853 d->ref.ref();
1854}
1855
1856/*!
1857 Destructor; called immediately before the object is deleted.
1858*/
1859QUrl::~QUrl()
1860{
1861 if (d && !d->ref.deref())
1862 delete d;
1863}
1864
1865/*!
1866 Returns \c true if the URL is non-empty and valid; otherwise returns \c false.
1867
1868 The URL is run through a conformance test. Every part of the URL
1869 must conform to the standard encoding rules of the URI standard
1870 for the URL to be reported as valid.
1871
1872 \snippet code/src_corelib_io_qurl.cpp 2
1873*/
1874bool QUrl::isValid() const
1875{
1876 if (isEmpty()) {
1877 // also catches d == nullptr
1878 return false;
1879 }
1880 return d->validityError() == QUrlPrivate::NoError;
1881}
1882
1883/*!
1884 Returns \c true if the URL has no data; otherwise returns \c false.
1885
1886 \sa clear()
1887*/
1888bool QUrl::isEmpty() const
1889{
1890 if (!d) return true;
1891 return d->isEmpty();
1892}
1893
1894/*!
1895 Resets the content of the QUrl. After calling this function, the
1896 QUrl is equal to one that has been constructed with the default
1897 empty constructor.
1898
1899 \sa isEmpty()
1900*/
1901void QUrl::clear()
1902{
1903 if (d && !d->ref.deref())
1904 delete d;
1905 d = nullptr;
1906}
1907
1908/*!
1909 Parses \a url and sets this object to that value. QUrl will automatically
1910 percent encode all characters that are not allowed in a URL and decode the
1911 percent-encoded sequences that represent an unreserved character (letters,
1912 digits, hyphens, underscores, dots and tildes). All other characters are
1913 left in their original forms.
1914
1915 Parses the \a url using the parser mode \a parsingMode. In TolerantMode
1916 (the default), QUrl will correct certain mistakes, notably the presence of
1917 a percent character ('%') not followed by two hexadecimal digits, and it
1918 will accept any character in any position. In StrictMode, encoding mistakes
1919 will not be tolerated and QUrl will also check that certain forbidden
1920 characters are not present in unencoded form. If an error is detected in
1921 StrictMode, isValid() will return false. The parsing mode DecodedMode is
1922 not permitted in this context and will produce a run-time warning.
1923
1924 \sa url(), toString()
1925*/
1926void QUrl::setUrl(const QString &url, ParsingMode parsingMode)
1927{
1928 if (parsingMode == DecodedMode) {
1929 qWarning(msg: "QUrl: QUrl::DecodedMode is not permitted when parsing a full URL");
1930 } else {
1931 detach();
1932 d->parse(url, parsingMode);
1933 }
1934}
1935
1936/*!
1937 Sets the scheme of the URL to \a scheme. As a scheme can only
1938 contain ASCII characters, no conversion or decoding is done on the
1939 input. It must also start with an ASCII letter.
1940
1941 The scheme describes the type (or protocol) of the URL. It's
1942 represented by one or more ASCII characters at the start the URL.
1943
1944 A scheme is strictly \l {RFC 3986}-compliant:
1945 \tt {scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )}
1946
1947 The following example shows a URL where the scheme is "ftp":
1948
1949 \image qurl-authority2.png
1950
1951 To set the scheme, the following call is used:
1952 \snippet code/src_corelib_io_qurl.cpp 11
1953
1954 The scheme can also be empty, in which case the URL is interpreted
1955 as relative.
1956
1957 \sa scheme(), isRelative()
1958*/
1959void QUrl::setScheme(const QString &scheme)
1960{
1961 detach();
1962 d->clearError();
1963 if (scheme.isEmpty()) {
1964 // schemes are not allowed to be empty
1965 d->sectionIsPresent &= ~QUrlPrivate::Scheme;
1966 d->flags &= ~QUrlPrivate::IsLocalFile;
1967 d->scheme.clear();
1968 } else {
1969 d->setScheme(value: scheme, len: scheme.size(), /* do set error */ doSetError: true);
1970 }
1971}
1972
1973/*!
1974 Returns the scheme of the URL. If an empty string is returned,
1975 this means the scheme is undefined and the URL is then relative.
1976
1977 The scheme can only contain US-ASCII letters or digits, which means it
1978 cannot contain any character that would otherwise require encoding.
1979 Additionally, schemes are always returned in lowercase form.
1980
1981 \sa setScheme(), isRelative()
1982*/
1983QString QUrl::scheme() const
1984{
1985 if (!d) return QString();
1986
1987 return d->scheme;
1988}
1989
1990/*!
1991 Sets the authority of the URL to \a authority.
1992
1993 The authority of a URL is the combination of user info, a host
1994 name and a port. All of these elements are optional; an empty
1995 authority is therefore valid.
1996
1997 The user info and host are separated by a '@', and the host and
1998 port are separated by a ':'. If the user info is empty, the '@'
1999 must be omitted; although a stray ':' is permitted if the port is
2000 empty.
2001
2002 The following example shows a valid authority string:
2003
2004 \image qurl-authority.png
2005
2006 The \a authority data is interpreted according to \a mode: in StrictMode,
2007 any '%' characters must be followed by exactly two hexadecimal characters
2008 and some characters (including space) are not allowed in undecoded form. In
2009 TolerantMode (the default), all characters are accepted in undecoded form
2010 and the tolerant parser will correct stray '%' not followed by two hex
2011 characters.
2012
2013 This function does not allow \a mode to be QUrl::DecodedMode. To set fully
2014 decoded data, call setUserName(), setPassword(), setHost() and setPort()
2015 individually.
2016
2017 \sa setUserInfo(), setHost(), setPort()
2018*/
2019void QUrl::setAuthority(const QString &authority, ParsingMode mode)
2020{
2021 detach();
2022 d->clearError();
2023
2024 if (mode == DecodedMode) {
2025 qWarning(msg: "QUrl::setAuthority(): QUrl::DecodedMode is not permitted in this function");
2026 return;
2027 }
2028
2029 d->setAuthority(auth: authority, from: 0, end: authority.size(), mode);
2030 if (authority.isNull()) {
2031 // QUrlPrivate::setAuthority cleared almost everything
2032 // but it leaves the Host bit set
2033 d->sectionIsPresent &= ~QUrlPrivate::Authority;
2034 }
2035}
2036
2037/*!
2038 Returns the authority of the URL if it is defined; otherwise
2039 an empty string is returned.
2040
2041 This function returns an unambiguous value, which may contain that
2042 characters still percent-encoded, plus some control sequences not
2043 representable in decoded form in QString.
2044
2045 The \a options argument controls how to format the user info component. The
2046 value of QUrl::FullyDecoded is not permitted in this function. If you need
2047 to obtain fully decoded data, call userName(), password(), host() and
2048 port() individually.
2049
2050 \sa setAuthority(), userInfo(), userName(), password(), host(), port()
2051*/
2052QString QUrl::authority(ComponentFormattingOptions options) const
2053{
2054 QString result;
2055 if (!d)
2056 return result;
2057
2058 if (options == QUrl::FullyDecoded) {
2059 qWarning(msg: "QUrl::authority(): QUrl::FullyDecoded is not permitted in this function");
2060 return result;
2061 }
2062
2063 d->appendAuthority(appendTo&: result, options, appendingTo: QUrlPrivate::Authority);
2064 return result;
2065}
2066
2067/*!
2068 Sets the user info of the URL to \a userInfo. The user info is an
2069 optional part of the authority of the URL, as described in
2070 setAuthority().
2071
2072 The user info consists of a user name and optionally a password,
2073 separated by a ':'. If the password is empty, the colon must be
2074 omitted. The following example shows a valid user info string:
2075
2076 \image qurl-authority3.png
2077
2078 The \a userInfo data is interpreted according to \a mode: in StrictMode,
2079 any '%' characters must be followed by exactly two hexadecimal characters
2080 and some characters (including space) are not allowed in undecoded form. In
2081 TolerantMode (the default), all characters are accepted in undecoded form
2082 and the tolerant parser will correct stray '%' not followed by two hex
2083 characters.
2084
2085 This function does not allow \a mode to be QUrl::DecodedMode. To set fully
2086 decoded data, call setUserName() and setPassword() individually.
2087
2088 \sa userInfo(), setUserName(), setPassword(), setAuthority()
2089*/
2090void QUrl::setUserInfo(const QString &userInfo, ParsingMode mode)
2091{
2092 detach();
2093 d->clearError();
2094 QString trimmed = userInfo.trimmed();
2095 if (mode == DecodedMode) {
2096 qWarning(msg: "QUrl::setUserInfo(): QUrl::DecodedMode is not permitted in this function");
2097 return;
2098 }
2099
2100 d->setUserInfo(userInfo: trimmed, from: 0, end: trimmed.size());
2101 if (userInfo.isNull()) {
2102 // QUrlPrivate::setUserInfo cleared almost everything
2103 // but it leaves the UserName bit set
2104 d->sectionIsPresent &= ~QUrlPrivate::UserInfo;
2105 } else if (mode == StrictMode && !d->validateComponent(section: QUrlPrivate::UserInfo, input: userInfo)) {
2106 d->sectionIsPresent &= ~QUrlPrivate::UserInfo;
2107 d->userName.clear();
2108 d->password.clear();
2109 }
2110}
2111
2112/*!
2113 Returns the user info of the URL, or an empty string if the user
2114 info is undefined.
2115
2116 This function returns an unambiguous value, which may contain that
2117 characters still percent-encoded, plus some control sequences not
2118 representable in decoded form in QString.
2119
2120 The \a options argument controls how to format the user info component. The
2121 value of QUrl::FullyDecoded is not permitted in this function. If you need
2122 to obtain fully decoded data, call userName() and password() individually.
2123
2124 \sa setUserInfo(), userName(), password(), authority()
2125*/
2126QString QUrl::userInfo(ComponentFormattingOptions options) const
2127{
2128 QString result;
2129 if (!d)
2130 return result;
2131
2132 if (options == QUrl::FullyDecoded) {
2133 qWarning(msg: "QUrl::userInfo(): QUrl::FullyDecoded is not permitted in this function");
2134 return result;
2135 }
2136
2137 d->appendUserInfo(appendTo&: result, options, appendingTo: QUrlPrivate::UserInfo);
2138 return result;
2139}
2140
2141/*!
2142 Sets the URL's user name to \a userName. The \a userName is part
2143 of the user info element in the authority of the URL, as described
2144 in setUserInfo().
2145
2146 The \a userName data is interpreted according to \a mode: in StrictMode,
2147 any '%' characters must be followed by exactly two hexadecimal characters
2148 and some characters (including space) are not allowed in undecoded form. In
2149 TolerantMode (the default), all characters are accepted in undecoded form
2150 and the tolerant parser will correct stray '%' not followed by two hex
2151 characters. In DecodedMode, '%' stand for themselves and encoded characters
2152 are not possible.
2153
2154 QUrl::DecodedMode should be used when setting the user name from a data
2155 source which is not a URL, such as a password dialog shown to the user or
2156 with a user name obtained by calling userName() with the QUrl::FullyDecoded
2157 formatting option.
2158
2159 \sa userName(), setUserInfo()
2160*/
2161void QUrl::setUserName(const QString &userName, ParsingMode mode)
2162{
2163 detach();
2164 d->clearError();
2165
2166 QString data = userName;
2167 if (mode == DecodedMode) {
2168 parseDecodedComponent(data);
2169 mode = TolerantMode;
2170 }
2171
2172 d->setUserName(value: data, from: 0, end: data.size());
2173 if (userName.isNull())
2174 d->sectionIsPresent &= ~QUrlPrivate::UserName;
2175 else if (mode == StrictMode && !d->validateComponent(section: QUrlPrivate::UserName, input: userName))
2176 d->userName.clear();
2177}
2178
2179/*!
2180 Returns the user name of the URL if it is defined; otherwise
2181 an empty string is returned.
2182
2183 The \a options argument controls how to format the user name component. All
2184 values produce an unambiguous result. With QUrl::FullyDecoded, all
2185 percent-encoded sequences are decoded; otherwise, the returned value may
2186 contain some percent-encoded sequences for some control sequences not
2187 representable in decoded form in QString.
2188
2189 Note that QUrl::FullyDecoded may cause data loss if those non-representable
2190 sequences are present. It is recommended to use that value when the result
2191 will be used in a non-URL context, such as setting in QAuthenticator or
2192 negotiating a login.
2193
2194 \sa setUserName(), userInfo()
2195*/
2196QString QUrl::userName(ComponentFormattingOptions options) const
2197{
2198 QString result;
2199 if (d)
2200 d->appendUserName(appendTo&: result, options);
2201 return result;
2202}
2203
2204/*!
2205 Sets the URL's password to \a password. The \a password is part of
2206 the user info element in the authority of the URL, as described in
2207 setUserInfo().
2208
2209 The \a password data is interpreted according to \a mode: in StrictMode,
2210 any '%' characters must be followed by exactly two hexadecimal characters
2211 and some characters (including space) are not allowed in undecoded form. In
2212 TolerantMode, all characters are accepted in undecoded form and the
2213 tolerant parser will correct stray '%' not followed by two hex characters.
2214 In DecodedMode, '%' stand for themselves and encoded characters are not
2215 possible.
2216
2217 QUrl::DecodedMode should be used when setting the password from a data
2218 source which is not a URL, such as a password dialog shown to the user or
2219 with a password obtained by calling password() with the QUrl::FullyDecoded
2220 formatting option.
2221
2222 \sa password(), setUserInfo()
2223*/
2224void QUrl::setPassword(const QString &password, ParsingMode mode)
2225{
2226 detach();
2227 d->clearError();
2228
2229 QString data = password;
2230 if (mode == DecodedMode) {
2231 parseDecodedComponent(data);
2232 mode = TolerantMode;
2233 }
2234
2235 d->setPassword(value: data, from: 0, end: data.size());
2236 if (password.isNull())
2237 d->sectionIsPresent &= ~QUrlPrivate::Password;
2238 else if (mode == StrictMode && !d->validateComponent(section: QUrlPrivate::Password, input: password))
2239 d->password.clear();
2240}
2241
2242/*!
2243 Returns the password of the URL if it is defined; otherwise
2244 an empty string is returned.
2245
2246 The \a options argument controls how to format the user name component. All
2247 values produce an unambiguous result. With QUrl::FullyDecoded, all
2248 percent-encoded sequences are decoded; otherwise, the returned value may
2249 contain some percent-encoded sequences for some control sequences not
2250 representable in decoded form in QString.
2251
2252 Note that QUrl::FullyDecoded may cause data loss if those non-representable
2253 sequences are present. It is recommended to use that value when the result
2254 will be used in a non-URL context, such as setting in QAuthenticator or
2255 negotiating a login.
2256
2257 \sa setPassword()
2258*/
2259QString QUrl::password(ComponentFormattingOptions options) const
2260{
2261 QString result;
2262 if (d)
2263 d->appendPassword(appendTo&: result, options);
2264 return result;
2265}
2266
2267/*!
2268 Sets the host of the URL to \a host. The host is part of the
2269 authority.
2270
2271 The \a host data is interpreted according to \a mode: in StrictMode,
2272 any '%' characters must be followed by exactly two hexadecimal characters
2273 and some characters (including space) are not allowed in undecoded form. In
2274 TolerantMode, all characters are accepted in undecoded form and the
2275 tolerant parser will correct stray '%' not followed by two hex characters.
2276 In DecodedMode, '%' stand for themselves and encoded characters are not
2277 possible.
2278
2279 Note that, in all cases, the result of the parsing must be a valid hostname
2280 according to STD 3 rules, as modified by the Internationalized Resource
2281 Identifiers specification (RFC 3987). Invalid hostnames are not permitted
2282 and will cause isValid() to become false.
2283
2284 \sa host(), setAuthority()
2285*/
2286void QUrl::setHost(const QString &host, ParsingMode mode)
2287{
2288 detach();
2289 d->clearError();
2290
2291 QString data = host;
2292 if (mode == DecodedMode) {
2293 parseDecodedComponent(data);
2294 mode = TolerantMode;
2295 }
2296
2297 if (d->setHost(value: data, from: 0, iend: data.size(), mode)) {
2298 if (host.isNull())
2299 d->sectionIsPresent &= ~QUrlPrivate::Host;
2300 } else if (!data.startsWith(c: u'[')) {
2301 // setHost failed, it might be IPv6 or IPvFuture in need of bracketing
2302 Q_ASSERT(d->error);
2303
2304 data.prepend(c: u'[');
2305 data.append(c: u']');
2306 if (!d->setHost(value: data, from: 0, iend: data.size(), mode)) {
2307 // failed again
2308 if (data.contains(c: u':')) {
2309 // source data contains ':', so it's an IPv6 error
2310 d->error->code = QUrlPrivate::InvalidIPv6AddressError;
2311 }
2312 } else {
2313 // succeeded
2314 d->clearError();
2315 }
2316 }
2317}
2318
2319/*!
2320 Returns the host of the URL if it is defined; otherwise
2321 an empty string is returned.
2322
2323 The \a options argument controls how the hostname will be formatted. The
2324 QUrl::EncodeUnicode option will cause this function to return the hostname
2325 in the ASCII-Compatible Encoding (ACE) form, which is suitable for use in
2326 channels that are not 8-bit clean or that require the legacy hostname (such
2327 as DNS requests or in HTTP request headers). If that flag is not present,
2328 this function returns the International Domain Name (IDN) in Unicode form,
2329 according to the list of permissible top-level domains (see
2330 idnWhitelist()).
2331
2332 All other flags are ignored. Host names cannot contain control or percent
2333 characters, so the returned value can be considered fully decoded.
2334
2335 \sa setHost(), idnWhitelist(), setIdnWhitelist(), authority()
2336*/
2337QString QUrl::host(ComponentFormattingOptions options) const
2338{
2339 QString result;
2340 if (d) {
2341 d->appendHost(appendTo&: result, options);
2342 if (result.startsWith(c: u'['))
2343 result = result.mid(position: 1, n: result.size() - 2);
2344 }
2345 return result;
2346}
2347
2348/*!
2349 Sets the port of the URL to \a port. The port is part of the
2350 authority of the URL, as described in setAuthority().
2351
2352 \a port must be between 0 and 65535 inclusive. Setting the
2353 port to -1 indicates that the port is unspecified.
2354*/
2355void QUrl::setPort(int port)
2356{
2357 detach();
2358 d->clearError();
2359
2360 if (port < -1 || port > 65535) {
2361 d->setError(errorCode: QUrlPrivate::InvalidPortError, source: QString::number(port), supplement: 0);
2362 port = -1;
2363 }
2364
2365 d->port = port;
2366 if (port != -1)
2367 d->sectionIsPresent |= QUrlPrivate::Host;
2368}
2369
2370/*!
2371 \since 4.1
2372
2373 Returns the port of the URL, or \a defaultPort if the port is
2374 unspecified.
2375
2376 Example:
2377
2378 \snippet code/src_corelib_io_qurl.cpp 3
2379*/
2380int QUrl::port(int defaultPort) const
2381{
2382 if (!d) return defaultPort;
2383 return d->port == -1 ? defaultPort : d->port;
2384}
2385
2386/*!
2387 Sets the path of the URL to \a path. The path is the part of the
2388 URL that comes after the authority but before the query string.
2389
2390 \image qurl-ftppath.png
2391
2392 For non-hierarchical schemes, the path will be everything
2393 following the scheme declaration, as in the following example:
2394
2395 \image qurl-mailtopath.png
2396
2397 The \a path data is interpreted according to \a mode: in StrictMode,
2398 any '%' characters must be followed by exactly two hexadecimal characters
2399 and some characters (including space) are not allowed in undecoded form. In
2400 TolerantMode, all characters are accepted in undecoded form and the
2401 tolerant parser will correct stray '%' not followed by two hex characters.
2402 In DecodedMode, '%' stand for themselves and encoded characters are not
2403 possible.
2404
2405 QUrl::DecodedMode should be used when setting the path from a data source
2406 which is not a URL, such as a dialog shown to the user or with a path
2407 obtained by calling path() with the QUrl::FullyDecoded formatting option.
2408
2409 \sa path()
2410*/
2411void QUrl::setPath(const QString &path, ParsingMode mode)
2412{
2413 detach();
2414 d->clearError();
2415
2416 QString data = path;
2417 if (mode == DecodedMode) {
2418 parseDecodedComponent(data);
2419 mode = TolerantMode;
2420 }
2421
2422 d->setPath(value: data, from: 0, end: data.size());
2423
2424 // optimized out, since there is no path delimiter
2425// if (path.isNull())
2426// d->sectionIsPresent &= ~QUrlPrivate::Path;
2427// else
2428 if (mode == StrictMode && !d->validateComponent(section: QUrlPrivate::Path, input: path))
2429 d->path.clear();
2430}
2431
2432/*!
2433 Returns the path of the URL.
2434
2435 \snippet code/src_corelib_io_qurl.cpp 12
2436
2437 The \a options argument controls how to format the path component. All
2438 values produce an unambiguous result. With QUrl::FullyDecoded, all
2439 percent-encoded sequences are decoded; otherwise, the returned value may
2440 contain some percent-encoded sequences for some control sequences not
2441 representable in decoded form in QString.
2442
2443 Note that QUrl::FullyDecoded may cause data loss if those non-representable
2444 sequences are present. It is recommended to use that value when the result
2445 will be used in a non-URL context, such as sending to an FTP server.
2446
2447 An example of data loss is when you have non-Unicode percent-encoded sequences
2448 and use FullyDecoded (the default):
2449
2450 \snippet code/src_corelib_io_qurl.cpp 13
2451
2452 In this example, there will be some level of data loss because the \c %FF cannot
2453 be converted.
2454
2455 Data loss can also occur when the path contains sub-delimiters (such as \c +):
2456
2457 \snippet code/src_corelib_io_qurl.cpp 14
2458
2459 Other decoding examples:
2460
2461 \snippet code/src_corelib_io_qurl.cpp 15
2462
2463 \sa setPath()
2464*/
2465QString QUrl::path(ComponentFormattingOptions options) const
2466{
2467 QString result;
2468 if (d)
2469 d->appendPath(appendTo&: result, options, appendingTo: QUrlPrivate::Path);
2470 return result;
2471}
2472
2473/*!
2474 \since 5.2
2475
2476 Returns the name of the file, excluding the directory path.
2477
2478 Note that, if this QUrl object is given a path ending in a slash, the name of the file is considered empty.
2479
2480 If the path doesn't contain any slash, it is fully returned as the fileName.
2481
2482 Example:
2483
2484 \snippet code/src_corelib_io_qurl.cpp 7
2485
2486 The \a options argument controls how to format the file name component. All
2487 values produce an unambiguous result. With QUrl::FullyDecoded, all
2488 percent-encoded sequences are decoded; otherwise, the returned value may
2489 contain some percent-encoded sequences for some control sequences not
2490 representable in decoded form in QString.
2491
2492 \sa path()
2493*/
2494QString QUrl::fileName(ComponentFormattingOptions options) const
2495{
2496 const QString ourPath = path(options);
2497 const qsizetype slash = ourPath.lastIndexOf(c: u'/');
2498 if (slash == -1)
2499 return ourPath;
2500 return ourPath.mid(position: slash + 1);
2501}
2502
2503/*!
2504 \since 4.2
2505
2506 Returns \c true if this URL contains a Query (i.e., if ? was seen on it).
2507
2508 \sa setQuery(), query(), hasFragment()
2509*/
2510bool QUrl::hasQuery() const
2511{
2512 if (!d) return false;
2513 return d->hasQuery();
2514}
2515
2516/*!
2517 Sets the query string of the URL to \a query.
2518
2519 This function is useful if you need to pass a query string that
2520 does not fit into the key-value pattern, or that uses a different
2521 scheme for encoding special characters than what is suggested by
2522 QUrl.
2523
2524 Passing a value of QString() to \a query (a null QString) unsets
2525 the query completely. However, passing a value of QString("")
2526 will set the query to an empty value, as if the original URL
2527 had a lone "?".
2528
2529 The \a query data is interpreted according to \a mode: in StrictMode,
2530 any '%' characters must be followed by exactly two hexadecimal characters
2531 and some characters (including space) are not allowed in undecoded form. In
2532 TolerantMode, all characters are accepted in undecoded form and the
2533 tolerant parser will correct stray '%' not followed by two hex characters.
2534 In DecodedMode, '%' stand for themselves and encoded characters are not
2535 possible.
2536
2537 Query strings often contain percent-encoded sequences, so use of
2538 DecodedMode is discouraged. One special sequence to be aware of is that of
2539 the plus character ('+'). QUrl does not convert spaces to plus characters,
2540 even though HTML forms posted by web browsers do. In order to represent an
2541 actual plus character in a query, the sequence "%2B" is usually used. This
2542 function will leave "%2B" sequences untouched in TolerantMode or
2543 StrictMode.
2544
2545 \sa query(), hasQuery()
2546*/
2547void QUrl::setQuery(const QString &query, ParsingMode mode)
2548{
2549 detach();
2550 d->clearError();
2551
2552 QString data = query;
2553 if (mode == DecodedMode) {
2554 parseDecodedComponent(data);
2555 mode = TolerantMode;
2556 }
2557
2558 d->setQuery(value: data, from: 0, iend: data.size());
2559 if (query.isNull())
2560 d->sectionIsPresent &= ~QUrlPrivate::Query;
2561 else if (mode == StrictMode && !d->validateComponent(section: QUrlPrivate::Query, input: query))
2562 d->query.clear();
2563}
2564
2565/*!
2566 \overload
2567 \since 5.0
2568 Sets the query string of the URL to \a query.
2569
2570 This function reconstructs the query string from the QUrlQuery object and
2571 sets on this QUrl object. This function does not have parsing parameters
2572 because the QUrlQuery contains data that is already parsed.
2573
2574 \sa query(), hasQuery()
2575*/
2576void QUrl::setQuery(const QUrlQuery &query)
2577{
2578 detach();
2579 d->clearError();
2580
2581 // we know the data is in the right format
2582 d->query = query.toString();
2583 if (query.isEmpty())
2584 d->sectionIsPresent &= ~QUrlPrivate::Query;
2585 else
2586 d->sectionIsPresent |= QUrlPrivate::Query;
2587}
2588
2589/*!
2590 Returns the query string of the URL if there's a query string, or an empty
2591 result if not. To determine if the parsed URL contained a query string, use
2592 hasQuery().
2593
2594 The \a options argument controls how to format the query component. All
2595 values produce an unambiguous result. With QUrl::FullyDecoded, all
2596 percent-encoded sequences are decoded; otherwise, the returned value may
2597 contain some percent-encoded sequences for some control sequences not
2598 representable in decoded form in QString.
2599
2600 Note that use of QUrl::FullyDecoded in queries is discouraged, as queries
2601 often contain data that is supposed to remain percent-encoded, including
2602 the use of the "%2B" sequence to represent a plus character ('+').
2603
2604 \sa setQuery(), hasQuery()
2605*/
2606QString QUrl::query(ComponentFormattingOptions options) const
2607{
2608 QString result;
2609 if (d) {
2610 d->appendQuery(appendTo&: result, options, appendingTo: QUrlPrivate::Query);
2611 if (d->hasQuery() && result.isNull())
2612 result.detach();
2613 }
2614 return result;
2615}
2616
2617/*!
2618 Sets the fragment of the URL to \a fragment. The fragment is the
2619 last part of the URL, represented by a '#' followed by a string of
2620 characters. It is typically used in HTTP for referring to a
2621 certain link or point on a page:
2622
2623 \image qurl-fragment.png
2624
2625 The fragment is sometimes also referred to as the URL "reference".
2626
2627 Passing an argument of QString() (a null QString) will unset the fragment.
2628 Passing an argument of QString("") (an empty but not null QString) will set the
2629 fragment to an empty string (as if the original URL had a lone "#").
2630
2631 The \a fragment data is interpreted according to \a mode: in StrictMode,
2632 any '%' characters must be followed by exactly two hexadecimal characters
2633 and some characters (including space) are not allowed in undecoded form. In
2634 TolerantMode, all characters are accepted in undecoded form and the
2635 tolerant parser will correct stray '%' not followed by two hex characters.
2636 In DecodedMode, '%' stand for themselves and encoded characters are not
2637 possible.
2638
2639 QUrl::DecodedMode should be used when setting the fragment from a data
2640 source which is not a URL or with a fragment obtained by calling
2641 fragment() with the QUrl::FullyDecoded formatting option.
2642
2643 \sa fragment(), hasFragment()
2644*/
2645void QUrl::setFragment(const QString &fragment, ParsingMode mode)
2646{
2647 detach();
2648 d->clearError();
2649
2650 QString data = fragment;
2651 if (mode == DecodedMode) {
2652 parseDecodedComponent(data);
2653 mode = TolerantMode;
2654 }
2655
2656 d->setFragment(value: data, from: 0, end: data.size());
2657 if (fragment.isNull())
2658 d->sectionIsPresent &= ~QUrlPrivate::Fragment;
2659 else if (mode == StrictMode && !d->validateComponent(section: QUrlPrivate::Fragment, input: fragment))
2660 d->fragment.clear();
2661}
2662
2663/*!
2664 Returns the fragment of the URL. To determine if the parsed URL contained a
2665 fragment, use hasFragment().
2666
2667 The \a options argument controls how to format the fragment component. All
2668 values produce an unambiguous result. With QUrl::FullyDecoded, all
2669 percent-encoded sequences are decoded; otherwise, the returned value may
2670 contain some percent-encoded sequences for some control sequences not
2671 representable in decoded form in QString.
2672
2673 Note that QUrl::FullyDecoded may cause data loss if those non-representable
2674 sequences are present. It is recommended to use that value when the result
2675 will be used in a non-URL context.
2676
2677 \sa setFragment(), hasFragment()
2678*/
2679QString QUrl::fragment(ComponentFormattingOptions options) const
2680{
2681 QString result;
2682 if (d) {
2683 d->appendFragment(appendTo&: result, options, appendingTo: QUrlPrivate::Fragment);
2684 if (d->hasFragment() && result.isNull())
2685 result.detach();
2686 }
2687 return result;
2688}
2689
2690/*!
2691 \since 4.2
2692
2693 Returns \c true if this URL contains a fragment (i.e., if # was seen on it).
2694
2695 \sa fragment(), setFragment()
2696*/
2697bool QUrl::hasFragment() const
2698{
2699 if (!d) return false;
2700 return d->hasFragment();
2701}
2702
2703/*!
2704 Returns the result of the merge of this URL with \a relative. This
2705 URL is used as a base to convert \a relative to an absolute URL.
2706
2707 If \a relative is not a relative URL, this function will return \a
2708 relative directly. Otherwise, the paths of the two URLs are
2709 merged, and the new URL returned has the scheme and authority of
2710 the base URL, but with the merged path, as in the following
2711 example:
2712
2713 \snippet code/src_corelib_io_qurl.cpp 5
2714
2715 Calling resolved() with ".." returns a QUrl whose directory is
2716 one level higher than the original. Similarly, calling resolved()
2717 with "../.." removes two levels from the path. If \a relative is
2718 "/", the path becomes "/".
2719
2720 \sa isRelative()
2721*/
2722QUrl QUrl::resolved(const QUrl &relative) const
2723{
2724 if (!d) return relative;
2725 if (!relative.d) return *this;
2726
2727 QUrl t;
2728 if (!relative.d->scheme.isEmpty()) {
2729 t = relative;
2730 t.detach();
2731 } else {
2732 if (relative.d->hasAuthority()) {
2733 t = relative;
2734 t.detach();
2735 } else {
2736 t.d = new QUrlPrivate;
2737
2738 // copy the authority
2739 t.d->userName = d->userName;
2740 t.d->password = d->password;
2741 t.d->host = d->host;
2742 t.d->port = d->port;
2743 t.d->sectionIsPresent = d->sectionIsPresent & QUrlPrivate::Authority;
2744
2745 if (relative.d->path.isEmpty()) {
2746 t.d->path = d->path;
2747 if (relative.d->hasQuery()) {
2748 t.d->query = relative.d->query;
2749 t.d->sectionIsPresent |= QUrlPrivate::Query;
2750 } else if (d->hasQuery()) {
2751 t.d->query = d->query;
2752 t.d->sectionIsPresent |= QUrlPrivate::Query;
2753 }
2754 } else {
2755 t.d->path = relative.d->path.startsWith(c: u'/')
2756 ? relative.d->path
2757 : d->mergePaths(relativePath: relative.d->path);
2758 if (relative.d->hasQuery()) {
2759 t.d->query = relative.d->query;
2760 t.d->sectionIsPresent |= QUrlPrivate::Query;
2761 }
2762 }
2763 }
2764 t.d->scheme = d->scheme;
2765 if (d->hasScheme())
2766 t.d->sectionIsPresent |= QUrlPrivate::Scheme;
2767 else
2768 t.d->sectionIsPresent &= ~QUrlPrivate::Scheme;
2769 t.d->flags |= d->flags & QUrlPrivate::IsLocalFile;
2770 }
2771 t.d->fragment = relative.d->fragment;
2772 if (relative.d->hasFragment())
2773 t.d->sectionIsPresent |= QUrlPrivate::Fragment;
2774 else
2775 t.d->sectionIsPresent &= ~QUrlPrivate::Fragment;
2776
2777 removeDotsFromPath(path: &t.d->path);
2778
2779#if defined(QURL_DEBUG)
2780 qDebug("QUrl(\"%ls\").resolved(\"%ls\") = \"%ls\"",
2781 qUtf16Printable(url()),
2782 qUtf16Printable(relative.url()),
2783 qUtf16Printable(t.url()));
2784#endif
2785 return t;
2786}
2787
2788/*!
2789 Returns \c true if the URL is relative; otherwise returns \c false. A URL is
2790 relative reference if its scheme is undefined; this function is therefore
2791 equivalent to calling scheme().isEmpty().
2792
2793 Relative references are defined in RFC 3986 section 4.2.
2794
2795 \sa {Relative URLs vs Relative Paths}
2796*/
2797bool QUrl::isRelative() const
2798{
2799 if (!d) return true;
2800 return !d->hasScheme();
2801}
2802
2803/*!
2804 Returns a string representation of the URL. The output can be customized by
2805 passing flags with \a options. The option QUrl::FullyDecoded is not
2806 permitted in this function since it would generate ambiguous data.
2807
2808 The resulting QString can be passed back to a QUrl later on.
2809
2810 Synonym for toString(options).
2811
2812 \sa FormattingOptions, toEncoded(), toString()
2813*/
2814QString QUrl::url(FormattingOptions options) const
2815{
2816 return toString(options);
2817}
2818
2819/*!
2820 Returns a string representation of the URL. The output can be customized by
2821 passing flags with \a options. The option QUrl::FullyDecoded is not
2822 permitted in this function since it would generate ambiguous data.
2823
2824 The default formatting option is \l{QUrl::FormattingOptions}{PrettyDecoded}.
2825
2826 \sa FormattingOptions, url(), setUrl()
2827*/
2828QString QUrl::toString(FormattingOptions options) const
2829{
2830 QString url;
2831 if (!isValid()) {
2832 // also catches isEmpty()
2833 return url;
2834 }
2835 if ((options & QUrl::FullyDecoded) == QUrl::FullyDecoded) {
2836 qWarning(msg: "QUrl: QUrl::FullyDecoded is not permitted when reconstructing the full URL");
2837 options &= ~QUrl::FullyDecoded;
2838 //options |= QUrl::PrettyDecoded; // no-op, value is 0
2839 }
2840
2841 // return just the path if:
2842 // - QUrl::PreferLocalFile is passed
2843 // - QUrl::RemovePath isn't passed (rather stupid if the user did...)
2844 // - there's no query or fragment to return
2845 // that is, either they aren't present, or we're removing them
2846 // - it's a local file
2847 if (options.testFlag(f: QUrl::PreferLocalFile) && !options.testFlag(f: QUrl::RemovePath)
2848 && (!d->hasQuery() || options.testFlag(f: QUrl::RemoveQuery))
2849 && (!d->hasFragment() || options.testFlag(f: QUrl::RemoveFragment))
2850 && isLocalFile()) {
2851 url = d->toLocalFile(options: options | QUrl::FullyDecoded);
2852 return url;
2853 }
2854
2855 // for the full URL, we consider that the reserved characters are prettier if encoded
2856 if (options & DecodeReserved)
2857 options &= ~EncodeReserved;
2858 else
2859 options |= EncodeReserved;
2860
2861 if (!(options & QUrl::RemoveScheme) && d->hasScheme())
2862 url += d->scheme + u':';
2863
2864 bool pathIsAbsolute = d->path.startsWith(c: u'/');
2865 if (!((options & QUrl::RemoveAuthority) == QUrl::RemoveAuthority) && d->hasAuthority()) {
2866 url += "//"_L1;
2867 d->appendAuthority(appendTo&: url, options, appendingTo: QUrlPrivate::FullUrl);
2868 } else if (isLocalFile() && pathIsAbsolute) {
2869 // Comply with the XDG file URI spec, which requires triple slashes.
2870 url += "//"_L1;
2871 }
2872
2873 if (!(options & QUrl::RemovePath))
2874 d->appendPath(appendTo&: url, options, appendingTo: QUrlPrivate::FullUrl);
2875
2876 if (!(options & QUrl::RemoveQuery) && d->hasQuery()) {
2877 url += u'?';
2878 d->appendQuery(appendTo&: url, options, appendingTo: QUrlPrivate::FullUrl);
2879 }
2880 if (!(options & QUrl::RemoveFragment) && d->hasFragment()) {
2881 url += u'#';
2882 d->appendFragment(appendTo&: url, options, appendingTo: QUrlPrivate::FullUrl);
2883 }
2884
2885 return url;
2886}
2887
2888/*!
2889 \since 5.0
2890
2891 Returns a human-displayable string representation of the URL.
2892 The output can be customized by passing flags with \a options.
2893 The option RemovePassword is always enabled, since passwords
2894 should never be shown back to users.
2895
2896 With the default options, the resulting QString can be passed back
2897 to a QUrl later on, but any password that was present initially will
2898 be lost.
2899
2900 \sa FormattingOptions, toEncoded(), toString()
2901*/
2902
2903QString QUrl::toDisplayString(FormattingOptions options) const
2904{
2905 return toString(options: options | RemovePassword);
2906}
2907
2908/*!
2909 \since 5.2
2910
2911 Returns an adjusted version of the URL.
2912 The output can be customized by passing flags with \a options.
2913
2914 The encoding options from QUrl::ComponentFormattingOption don't make
2915 much sense for this method, nor does QUrl::PreferLocalFile.
2916
2917 This is always equivalent to QUrl(url.toString(options)).
2918
2919 \sa FormattingOptions, toEncoded(), toString()
2920*/
2921QUrl QUrl::adjusted(QUrl::FormattingOptions options) const
2922{
2923 if (!isValid()) {
2924 // also catches isEmpty()
2925 return QUrl();
2926 }
2927 QUrl that = *this;
2928 if (options & RemoveScheme)
2929 that.setScheme(QString());
2930 if ((options & RemoveAuthority) == RemoveAuthority) {
2931 that.setAuthority(authority: QString());
2932 } else {
2933 if ((options & RemoveUserInfo) == RemoveUserInfo)
2934 that.setUserInfo(userInfo: QString());
2935 else if (options & RemovePassword)
2936 that.setPassword(password: QString());
2937 if (options & RemovePort)
2938 that.setPort(-1);
2939 }
2940 if (options & RemoveQuery)
2941 that.setQuery(query: QString());
2942 if (options & RemoveFragment)
2943 that.setFragment(fragment: QString());
2944 if (options & RemovePath) {
2945 that.setPath(path: QString());
2946 } else if (options & (StripTrailingSlash | RemoveFilename | NormalizePathSegments)) {
2947 that.detach();
2948 QString path;
2949 d->appendPath(appendTo&: path, options: options | FullyEncoded, appendingTo: QUrlPrivate::Path);
2950 that.d->setPath(value: path, from: 0, end: path.size());
2951 }
2952 return that;
2953}
2954
2955/*!
2956 Returns the encoded representation of the URL if it's valid;
2957 otherwise an empty QByteArray is returned. The output can be
2958 customized by passing flags with \a options.
2959
2960 The user info, path and fragment are all converted to UTF-8, and
2961 all non-ASCII characters are then percent encoded. The host name
2962 is encoded using Punycode.
2963*/
2964QByteArray QUrl::toEncoded(FormattingOptions options) const
2965{
2966 options &= ~(FullyDecoded | FullyEncoded);
2967 return toString(options: options | FullyEncoded).toLatin1();
2968}
2969
2970/*!
2971 \fn QUrl QUrl::fromEncoded(const QByteArray &input, ParsingMode parsingMode)
2972
2973 Parses \a input and returns the corresponding QUrl. \a input is
2974 assumed to be in encoded form, containing only ASCII characters.
2975
2976 Parses the URL using \a mode. See setUrl() for more information on
2977 this parameter. QUrl::DecodedMode is not permitted in this context.
2978
2979 \sa toEncoded(), setUrl()
2980*/
2981QUrl QUrl::fromEncoded(const QByteArray &input, ParsingMode mode)
2982{
2983 return QUrl(QString::fromUtf8(utf8: input.constData(), size: input.size()), mode);
2984}
2985
2986/*!
2987 Returns a decoded copy of \a input. \a input is first decoded from
2988 percent encoding, then converted from UTF-8 to unicode.
2989
2990 \note Given invalid input (such as a string containing the sequence "%G5",
2991 which is not a valid hexadecimal number) the output will be invalid as
2992 well. As an example: the sequence "%G5" could be decoded to 'W'.
2993*/
2994QString QUrl::fromPercentEncoding(const QByteArray &input)
2995{
2996 QByteArray ba = QByteArray::fromPercentEncoding(pctEncoded: input);
2997 return QString::fromUtf8(utf8: ba, size: ba.size());
2998}
2999
3000/*!
3001 Returns an encoded copy of \a input. \a input is first converted
3002 to UTF-8, and all ASCII-characters that are not in the unreserved group
3003 are percent encoded. To prevent characters from being percent encoded
3004 pass them to \a exclude. To force characters to be percent encoded pass
3005 them to \a include.
3006
3007 Unreserved is defined as:
3008 \tt {ALPHA / DIGIT / "-" / "." / "_" / "~"}
3009
3010 \snippet code/src_corelib_io_qurl.cpp 6
3011*/
3012QByteArray QUrl::toPercentEncoding(const QString &input, const QByteArray &exclude, const QByteArray &include)
3013{
3014 return input.toUtf8().toPercentEncoding(exclude, include);
3015}
3016
3017/*!
3018 \since 6.3
3019
3020 Returns the Unicode form of the given domain name
3021 \a domain, which is encoded in the ASCII Compatible Encoding (ACE).
3022 The output can be customized by passing flags with \a options.
3023 The result of this function is considered equivalent to \a domain.
3024
3025 If the value in \a domain cannot be encoded, it will be converted
3026 to QString and returned.
3027
3028 The ASCII-Compatible Encoding (ACE) is defined by RFC 3490, RFC 3491
3029 and RFC 3492 and updated by the Unicode Technical Standard #46. It is part
3030 of the Internationalizing Domain Names in Applications (IDNA) specification,
3031 which allows for domain names (like \c "example.com") to be written using
3032 non-US-ASCII characters.
3033*/
3034QString QUrl::fromAce(const QByteArray &domain, QUrl::AceProcessingOptions options)
3035{
3036 return qt_ACE_do(domain: QString::fromLatin1(ba: domain), op: NormalizeAce,
3037 dot: ForbidLeadingDot /*FIXME: make configurable*/, options);
3038}
3039
3040/*!
3041 \since 6.3
3042
3043 Returns the ASCII Compatible Encoding of the given domain name \a domain.
3044 The output can be customized by passing flags with \a options.
3045 The result of this function is considered equivalent to \a domain.
3046
3047 The ASCII-Compatible Encoding (ACE) is defined by RFC 3490, RFC 3491
3048 and RFC 3492 and updated by the Unicode Technical Standard #46. It is part
3049 of the Internationalizing Domain Names in Applications (IDNA) specification,
3050 which allows for domain names (like \c "example.com") to be written using
3051 non-US-ASCII characters.
3052
3053 This function returns an empty QByteArray if \a domain is not a valid
3054 hostname. Note, in particular, that IPv6 literals are not valid domain
3055 names.
3056*/
3057QByteArray QUrl::toAce(const QString &domain, AceProcessingOptions options)
3058{
3059 return qt_ACE_do(domain, op: ToAceOnly, dot: ForbidLeadingDot /*FIXME: make configurable*/, options)
3060 .toLatin1();
3061}
3062
3063/*!
3064 \internal
3065
3066 Returns \c true if this URL is "less than" the given \a url. This
3067 provides a means of ordering URLs.
3068*/
3069bool QUrl::operator <(const QUrl &url) const
3070{
3071 if (!d || !url.d) {
3072 bool thisIsEmpty = !d || d->isEmpty();
3073 bool thatIsEmpty = !url.d || url.d->isEmpty();
3074
3075 // sort an empty URL first
3076 return thisIsEmpty && !thatIsEmpty;
3077 }
3078
3079 int cmp;
3080 cmp = d->scheme.compare(s: url.d->scheme);
3081 if (cmp != 0)
3082 return cmp < 0;
3083
3084 cmp = d->userName.compare(s: url.d->userName);
3085 if (cmp != 0)
3086 return cmp < 0;
3087
3088 cmp = d->password.compare(s: url.d->password);
3089 if (cmp != 0)
3090 return cmp < 0;
3091
3092 cmp = d->host.compare(s: url.d->host);
3093 if (cmp != 0)
3094 return cmp < 0;
3095
3096 if (d->port != url.d->port)
3097 return d->port < url.d->port;
3098
3099 cmp = d->path.compare(s: url.d->path);
3100 if (cmp != 0)
3101 return cmp < 0;
3102
3103 if (d->hasQuery() != url.d->hasQuery())
3104 return url.d->hasQuery();
3105
3106 cmp = d->query.compare(s: url.d->query);
3107 if (cmp != 0)
3108 return cmp < 0;
3109
3110 if (d->hasFragment() != url.d->hasFragment())
3111 return url.d->hasFragment();
3112
3113 cmp = d->fragment.compare(s: url.d->fragment);
3114 return cmp < 0;
3115}
3116
3117/*!
3118 Returns \c true if this URL and the given \a url are equal;
3119 otherwise returns \c false.
3120
3121 \sa matches()
3122*/
3123bool QUrl::operator ==(const QUrl &url) const
3124{
3125 if (!d && !url.d)
3126 return true;
3127 if (!d)
3128 return url.d->isEmpty();
3129 if (!url.d)
3130 return d->isEmpty();
3131
3132 // First, compare which sections are present, since it speeds up the
3133 // processing considerably. We just have to ignore the host-is-present flag
3134 // for local files (the "file" protocol), due to the requirements of the
3135 // XDG file URI specification.
3136 int mask = QUrlPrivate::FullUrl;
3137 if (isLocalFile())
3138 mask &= ~QUrlPrivate::Host;
3139 return (d->sectionIsPresent & mask) == (url.d->sectionIsPresent & mask) &&
3140 d->scheme == url.d->scheme &&
3141 d->userName == url.d->userName &&
3142 d->password == url.d->password &&
3143 d->host == url.d->host &&
3144 d->port == url.d->port &&
3145 d->path == url.d->path &&
3146 d->query == url.d->query &&
3147 d->fragment == url.d->fragment;
3148}
3149
3150/*!
3151 \since 5.2
3152
3153 Returns \c true if this URL and the given \a url are equal after
3154 applying \a options to both; otherwise returns \c false.
3155
3156 This is equivalent to calling adjusted(options) on both URLs
3157 and comparing the resulting urls, but faster.
3158
3159*/
3160bool QUrl::matches(const QUrl &url, FormattingOptions options) const
3161{
3162 if (!d && !url.d)
3163 return true;
3164 if (!d)
3165 return url.d->isEmpty();
3166 if (!url.d)
3167 return d->isEmpty();
3168
3169 // First, compare which sections are present, since it speeds up the
3170 // processing considerably. We just have to ignore the host-is-present flag
3171 // for local files (the "file" protocol), due to the requirements of the
3172 // XDG file URI specification.
3173 int mask = QUrlPrivate::FullUrl;
3174 if (isLocalFile())
3175 mask &= ~QUrlPrivate::Host;
3176
3177 if (options.testFlag(f: QUrl::RemoveScheme))
3178 mask &= ~QUrlPrivate::Scheme;
3179 else if (d->scheme != url.d->scheme)
3180 return false;
3181
3182 if (options.testFlag(f: QUrl::RemovePassword))
3183 mask &= ~QUrlPrivate::Password;
3184 else if (d->password != url.d->password)
3185 return false;
3186
3187 if (options.testFlag(f: QUrl::RemoveUserInfo))
3188 mask &= ~QUrlPrivate::UserName;
3189 else if (d->userName != url.d->userName)
3190 return false;
3191
3192 if (options.testFlag(f: QUrl::RemovePort))
3193 mask &= ~QUrlPrivate::Port;
3194 else if (d->port != url.d->port)
3195 return false;
3196
3197 if (options.testFlag(f: QUrl::RemoveAuthority))
3198 mask &= ~QUrlPrivate::Host;
3199 else if (d->host != url.d->host)
3200 return false;
3201
3202 if (options.testFlag(f: QUrl::RemoveQuery))
3203 mask &= ~QUrlPrivate::Query;
3204 else if (d->query != url.d->query)
3205 return false;
3206
3207 if (options.testFlag(f: QUrl::RemoveFragment))
3208 mask &= ~QUrlPrivate::Fragment;
3209 else if (d->fragment != url.d->fragment)
3210 return false;
3211
3212 if ((d->sectionIsPresent & mask) != (url.d->sectionIsPresent & mask))
3213 return false;
3214
3215 if (options.testFlag(f: QUrl::RemovePath))
3216 return true;
3217
3218 // Compare paths, after applying path-related options
3219 QString path1;
3220 d->appendPath(appendTo&: path1, options, appendingTo: QUrlPrivate::Path);
3221 QString path2;
3222 url.d->appendPath(appendTo&: path2, options, appendingTo: QUrlPrivate::Path);
3223 return path1 == path2;
3224}
3225
3226/*!
3227 Returns \c true if this URL and the given \a url are not equal;
3228 otherwise returns \c false.
3229
3230 \sa matches()
3231*/
3232bool QUrl::operator !=(const QUrl &url) const
3233{
3234 return !(*this == url);
3235}
3236
3237/*!
3238 Assigns the specified \a url to this object.
3239*/
3240QUrl &QUrl::operator =(const QUrl &url) noexcept
3241{
3242 if (!d) {
3243 if (url.d) {
3244 url.d->ref.ref();
3245 d = url.d;
3246 }
3247 } else {
3248 if (url.d)
3249 qAtomicAssign(d, x: url.d);
3250 else
3251 clear();
3252 }
3253 return *this;
3254}
3255
3256/*!
3257 Assigns the specified \a url to this object.
3258*/
3259QUrl &QUrl::operator =(const QString &url)
3260{
3261 if (url.isEmpty()) {
3262 clear();
3263 } else {
3264 detach();
3265 d->parse(url, parsingMode: TolerantMode);
3266 }
3267 return *this;
3268}
3269
3270/*!
3271 \fn void QUrl::swap(QUrl &other)
3272 \since 4.8
3273
3274 Swaps URL \a other with this URL. This operation is very
3275 fast and never fails.
3276*/
3277
3278/*!
3279 \internal
3280
3281 Forces a detach.
3282*/
3283void QUrl::detach()
3284{
3285 if (!d)
3286 d = new QUrlPrivate;
3287 else
3288 qAtomicDetach(d);
3289}
3290
3291/*!
3292 \internal
3293*/
3294bool QUrl::isDetached() const
3295{
3296 return !d || d->ref.loadRelaxed() == 1;
3297}
3298
3299static QString fromNativeSeparators(const QString &pathName)
3300{
3301#if defined(Q_OS_WIN)
3302 QString result(pathName);
3303 const QChar nativeSeparator = u'\\';
3304 auto i = result.indexOf(nativeSeparator);
3305 if (i != -1) {
3306 QChar * const data = result.data();
3307 const auto length = result.length();
3308 for (; i < length; ++i) {
3309 if (data[i] == nativeSeparator)
3310 data[i] = u'/';
3311 }
3312 }
3313 return result;
3314#else
3315 return pathName;
3316#endif
3317}
3318
3319/*!
3320 Returns a QUrl representation of \a localFile, interpreted as a local
3321 file. This function accepts paths separated by slashes as well as the
3322 native separator for this platform.
3323
3324 This function also accepts paths with a doubled leading slash (or
3325 backslash) to indicate a remote file, as in
3326 "//servername/path/to/file.txt". Note that only certain platforms can
3327 actually open this file using QFile::open().
3328
3329 An empty \a localFile leads to an empty URL (since Qt 5.4).
3330
3331 \snippet code/src_corelib_io_qurl.cpp 16
3332
3333 In the first line in snippet above, a file URL is constructed from a
3334 local, relative path. A file URL with a relative path only makes sense
3335 if there is a base URL to resolve it against. For example:
3336
3337 \snippet code/src_corelib_io_qurl.cpp 17
3338
3339 To resolve such a URL, it's necessary to remove the scheme beforehand:
3340
3341 \snippet code/src_corelib_io_qurl.cpp 18
3342
3343 For this reason, it is better to use a relative URL (that is, no scheme)
3344 for relative file paths:
3345
3346 \snippet code/src_corelib_io_qurl.cpp 19
3347
3348 \sa toLocalFile(), isLocalFile(), QDir::toNativeSeparators()
3349*/
3350QUrl QUrl::fromLocalFile(const QString &localFile)
3351{
3352 QUrl url;
3353 if (localFile.isEmpty())
3354 return url;
3355 QString scheme = fileScheme();
3356 QString deslashified = fromNativeSeparators(pathName: localFile);
3357
3358 // magic for drives on windows
3359 if (deslashified.size() > 1 && deslashified.at(i: 1) == u':' && deslashified.at(i: 0) != u'/') {
3360 deslashified.prepend(c: u'/');
3361 } else if (deslashified.startsWith(s: "//"_L1)) {
3362 // magic for shared drive on windows
3363 qsizetype indexOfPath = deslashified.indexOf(c: u'/', from: 2);
3364 QStringView hostSpec = QStringView{deslashified}.mid(pos: 2, n: indexOfPath - 2);
3365 // Check for Windows-specific WebDAV specification: "//host@SSL/path".
3366 if (hostSpec.endsWith(s: webDavSslTag(), cs: Qt::CaseInsensitive)) {
3367 hostSpec.truncate(n: hostSpec.size() - 4);
3368 scheme = webDavScheme();
3369 }
3370
3371 // hosts can't be IPv6 addresses without [], so we can use QUrlPrivate::setHost
3372 url.detach();
3373 if (!url.d->setHost(value: hostSpec.toString(), from: 0, iend: hostSpec.size(), mode: StrictMode)) {
3374 if (url.d->error->code != QUrlPrivate::InvalidRegNameError)
3375 return url;
3376
3377 // Path hostname is not a valid URL host, so set it entirely in the path
3378 // (by leaving deslashified unchanged)
3379 } else if (indexOfPath > 2) {
3380 deslashified = deslashified.right(n: deslashified.size() - indexOfPath);
3381 } else {
3382 deslashified.clear();
3383 }
3384 }
3385
3386 url.setScheme(scheme);
3387 url.setPath(path: deslashified, mode: DecodedMode);
3388 return url;
3389}
3390
3391/*!
3392 Returns the path of this URL formatted as a local file path. The path
3393 returned will use forward slashes, even if it was originally created
3394 from one with backslashes.
3395
3396 If this URL contains a non-empty hostname, it will be encoded in the
3397 returned value in the form found on SMB networks (for example,
3398 "//servername/path/to/file.txt").
3399
3400 \snippet code/src_corelib_io_qurl.cpp 20
3401
3402 Note: if the path component of this URL contains a non-UTF-8 binary
3403 sequence (such as %80), the behaviour of this function is undefined.
3404
3405 \sa fromLocalFile(), isLocalFile()
3406*/
3407QString QUrl::toLocalFile() const
3408{
3409 // the call to isLocalFile() also ensures that we're parsed
3410 if (!isLocalFile())
3411 return QString();
3412
3413 return d->toLocalFile(options: QUrl::FullyDecoded);
3414}
3415
3416/*!
3417 \since 4.8
3418 Returns \c true if this URL is pointing to a local file path. A URL is a
3419 local file path if the scheme is "file".
3420
3421 Note that this function considers URLs with hostnames to be local file
3422 paths, even if the eventual file path cannot be opened with
3423 QFile::open().
3424
3425 \sa fromLocalFile(), toLocalFile()
3426*/
3427bool QUrl::isLocalFile() const
3428{
3429 return d && d->isLocalFile();
3430}
3431
3432/*!
3433 Returns \c true if this URL is a parent of \a childUrl. \a childUrl is a child
3434 of this URL if the two URLs share the same scheme and authority,
3435 and this URL's path is a parent of the path of \a childUrl.
3436*/
3437bool QUrl::isParentOf(const QUrl &childUrl) const
3438{
3439 QString childPath = childUrl.path();
3440
3441 if (!d)
3442 return ((childUrl.scheme().isEmpty())
3443 && (childUrl.authority().isEmpty())
3444 && childPath.size() > 0 && childPath.at(i: 0) == u'/');
3445
3446 QString ourPath = path();
3447
3448 return ((childUrl.scheme().isEmpty() || d->scheme == childUrl.scheme())
3449 && (childUrl.authority().isEmpty() || authority() == childUrl.authority())
3450 && childPath.startsWith(s: ourPath)
3451 && ((ourPath.endsWith(c: u'/') && childPath.size() > ourPath.size())
3452 || (!ourPath.endsWith(c: u'/') && childPath.size() > ourPath.size()
3453 && childPath.at(i: ourPath.size()) == u'/')));
3454}
3455
3456
3457#ifndef QT_NO_DATASTREAM
3458/*! \relates QUrl
3459
3460 Writes url \a url to the stream \a out and returns a reference
3461 to the stream.
3462
3463 \sa{Serializing Qt Data Types}{Format of the QDataStream operators}
3464*/
3465QDataStream &operator<<(QDataStream &out, const QUrl &url)
3466{
3467 QByteArray u;
3468 if (url.isValid())
3469 u = url.toEncoded();
3470 out << u;
3471 return out;
3472}
3473
3474/*! \relates QUrl
3475
3476 Reads a url into \a url from the stream \a in and returns a
3477 reference to the stream.
3478
3479 \sa{Serializing Qt Data Types}{Format of the QDataStream operators}
3480*/
3481QDataStream &operator>>(QDataStream &in, QUrl &url)
3482{
3483 QByteArray u;
3484 in >> u;
3485 url.setUrl(url: QString::fromLatin1(ba: u));
3486 return in;
3487}
3488#endif // QT_NO_DATASTREAM
3489
3490#ifndef QT_NO_DEBUG_STREAM
3491QDebug operator<<(QDebug d, const QUrl &url)
3492{
3493 QDebugStateSaver saver(d);
3494 d.nospace() << "QUrl(" << url.toDisplayString() << ')';
3495 return d;
3496}
3497#endif
3498
3499static QString errorMessage(QUrlPrivate::ErrorCode errorCode, const QString &errorSource, qsizetype errorPosition)
3500{
3501 QChar c = size_t(errorPosition) < size_t(errorSource.size()) ?
3502 errorSource.at(i: errorPosition) : QChar(QChar::Null);
3503
3504 switch (errorCode) {
3505 case QUrlPrivate::NoError:
3506 Q_UNREACHABLE_RETURN(QString()); // QUrl::errorString should have treated this condition
3507
3508 case QUrlPrivate::InvalidSchemeError: {
3509 auto msg = "Invalid scheme (character '%1' not permitted)"_L1;
3510 return msg.arg(args&: c);
3511 }
3512
3513 case QUrlPrivate::InvalidUserNameError:
3514 return "Invalid user name (character '%1' not permitted)"_L1
3515 .arg(args&: c);
3516
3517 case QUrlPrivate::InvalidPasswordError:
3518 return "Invalid password (character '%1' not permitted)"_L1
3519 .arg(args&: c);
3520
3521 case QUrlPrivate::InvalidRegNameError:
3522 if (errorPosition >= 0)
3523 return "Invalid hostname (character '%1' not permitted)"_L1
3524 .arg(args&: c);
3525 else
3526 return QStringLiteral("Invalid hostname (contains invalid characters)");
3527 case QUrlPrivate::InvalidIPv4AddressError:
3528 return QString(); // doesn't happen yet
3529 case QUrlPrivate::InvalidIPv6AddressError:
3530 return QStringLiteral("Invalid IPv6 address");
3531 case QUrlPrivate::InvalidCharacterInIPv6Error:
3532 return "Invalid IPv6 address (character '%1' not permitted)"_L1.arg(args&: c);
3533 case QUrlPrivate::InvalidIPvFutureError:
3534 return "Invalid IPvFuture address (character '%1' not permitted)"_L1.arg(args&: c);
3535 case QUrlPrivate::HostMissingEndBracket:
3536 return QStringLiteral("Expected ']' to match '[' in hostname");
3537
3538 case QUrlPrivate::InvalidPortError:
3539 return QStringLiteral("Invalid port or port number out of range");
3540 case QUrlPrivate::PortEmptyError:
3541 return QStringLiteral("Port field was empty");
3542
3543 case QUrlPrivate::InvalidPathError:
3544 return "Invalid path (character '%1' not permitted)"_L1
3545 .arg(args&: c);
3546
3547 case QUrlPrivate::InvalidQueryError:
3548 return "Invalid query (character '%1' not permitted)"_L1
3549 .arg(args&: c);
3550
3551 case QUrlPrivate::InvalidFragmentError:
3552 return "Invalid fragment (character '%1' not permitted)"_L1
3553 .arg(args&: c);
3554
3555 case QUrlPrivate::AuthorityPresentAndPathIsRelative:
3556 return QStringLiteral("Path component is relative and authority is present");
3557 case QUrlPrivate::AuthorityAbsentAndPathIsDoubleSlash:
3558 return QStringLiteral("Path component starts with '//' and authority is absent");
3559 case QUrlPrivate::RelativeUrlPathContainsColonBeforeSlash:
3560 return QStringLiteral("Relative URL's path component contains ':' before any '/'");
3561 }
3562
3563 Q_UNREACHABLE_RETURN(QString());
3564}
3565
3566static inline void appendComponentIfPresent(QString &msg, bool present, const char *componentName,
3567 const QString &component)
3568{
3569 if (present)
3570 msg += QLatin1StringView(componentName) % u'"' % component % "\","_L1;
3571}
3572
3573/*!
3574 \since 4.2
3575
3576 Returns an error message if the last operation that modified this QUrl
3577 object ran into a parsing error. If no error was detected, this function
3578 returns an empty string and isValid() returns \c true.
3579
3580 The error message returned by this function is technical in nature and may
3581 not be understood by end users. It is mostly useful to developers trying to
3582 understand why QUrl will not accept some input.
3583
3584 \sa QUrl::ParsingMode
3585*/
3586QString QUrl::errorString() const
3587{
3588 QString msg;
3589 if (!d)
3590 return msg;
3591
3592 QString errorSource;
3593 qsizetype errorPosition = 0;
3594 QUrlPrivate::ErrorCode errorCode = d->validityError(source: &errorSource, position: &errorPosition);
3595 if (errorCode == QUrlPrivate::NoError)
3596 return msg;
3597
3598 msg += errorMessage(errorCode, errorSource, errorPosition);
3599 msg += "; source was \""_L1;
3600 msg += errorSource;
3601 msg += "\";"_L1;
3602 appendComponentIfPresent(msg, present: d->sectionIsPresent & QUrlPrivate::Scheme,
3603 componentName: " scheme = ", component: d->scheme);
3604 appendComponentIfPresent(msg, present: d->sectionIsPresent & QUrlPrivate::UserInfo,
3605 componentName: " userinfo = ", component: userInfo());
3606 appendComponentIfPresent(msg, present: d->sectionIsPresent & QUrlPrivate::Host,
3607 componentName: " host = ", component: d->host);
3608 appendComponentIfPresent(msg, present: d->port != -1,
3609 componentName: " port = ", component: QString::number(d->port));
3610 appendComponentIfPresent(msg, present: !d->path.isEmpty(),
3611 componentName: " path = ", component: d->path);
3612 appendComponentIfPresent(msg, present: d->sectionIsPresent & QUrlPrivate::Query,
3613 componentName: " query = ", component: d->query);
3614 appendComponentIfPresent(msg, present: d->sectionIsPresent & QUrlPrivate::Fragment,
3615 componentName: " fragment = ", component: d->fragment);
3616 if (msg.endsWith(c: u','))
3617 msg.chop(n: 1);
3618 return msg;
3619}
3620
3621/*!
3622 \since 5.1
3623
3624 Converts a list of \a urls into a list of QString objects, using toString(\a options).
3625*/
3626QStringList QUrl::toStringList(const QList<QUrl> &urls, FormattingOptions options)
3627{
3628 QStringList lst;
3629 lst.reserve(asize: urls.size());
3630 for (const QUrl &url : urls)
3631 lst.append(t: url.toString(options));
3632 return lst;
3633
3634}
3635
3636/*!
3637 \since 5.1
3638
3639 Converts a list of strings representing \a urls into a list of urls, using QUrl(str, \a mode).
3640 Note that this means all strings must be urls, not for instance local paths.
3641*/
3642QList<QUrl> QUrl::fromStringList(const QStringList &urls, ParsingMode mode)
3643{
3644 QList<QUrl> lst;
3645 lst.reserve(asize: urls.size());
3646 for (const QString &str : urls)
3647 lst.append(t: QUrl(str, mode));
3648 return lst;
3649}
3650
3651/*!
3652 \typedef QUrl::DataPtr
3653 \internal
3654*/
3655
3656/*!
3657 \fn DataPtr &QUrl::data_ptr()
3658 \internal
3659*/
3660
3661/*!
3662 Returns the hash value for the \a url. If specified, \a seed is used to
3663 initialize the hash.
3664
3665 \relates QHash
3666 \since 5.0
3667*/
3668size_t qHash(const QUrl &url, size_t seed) noexcept
3669{
3670 if (!url.d)
3671 return qHash(key: -1, seed); // the hash of an unset port (-1)
3672
3673 return qHash(key: url.d->scheme) ^
3674 qHash(key: url.d->userName) ^
3675 qHash(key: url.d->password) ^
3676 qHash(key: url.d->host) ^
3677 qHash(key: url.d->port, seed) ^
3678 qHash(key: url.d->path) ^
3679 qHash(key: url.d->query) ^
3680 qHash(key: url.d->fragment);
3681}
3682
3683static QUrl adjustFtpPath(QUrl url)
3684{
3685 if (url.scheme() == ftpScheme()) {
3686 QString path = url.path(options: QUrl::PrettyDecoded);
3687 if (path.startsWith(s: "//"_L1))
3688 url.setPath(path: "/%2F"_L1 + QStringView{path}.mid(pos: 2), mode: QUrl::TolerantMode);
3689 }
3690 return url;
3691}
3692
3693static bool isIp6(const QString &text)
3694{
3695 QIPAddressUtils::IPv6Address address;
3696 return !text.isEmpty() && QIPAddressUtils::parseIp6(address, begin: text.begin(), end: text.end()) == nullptr;
3697}
3698
3699/*!
3700 Returns a valid URL from a user supplied \a userInput string if one can be
3701 deduced. In the case that is not possible, an invalid QUrl() is returned.
3702
3703 This allows the user to input a URL or a local file path in the form of a plain
3704 string. This string can be manually typed into a location bar, obtained from
3705 the clipboard, or passed in via command line arguments.
3706
3707 When the string is not already a valid URL, a best guess is performed,
3708 making various assumptions.
3709
3710 In the case the string corresponds to a valid file path on the system,
3711 a file:// URL is constructed, using QUrl::fromLocalFile().
3712
3713 If that is not the case, an attempt is made to turn the string into a
3714 http:// or ftp:// URL. The latter in the case the string starts with
3715 'ftp'. The result is then passed through QUrl's tolerant parser, and
3716 in the case or success, a valid QUrl is returned, or else a QUrl().
3717
3718 \section1 Examples:
3719
3720 \list
3721 \li qt-project.org becomes http://qt-project.org
3722 \li ftp.qt-project.org becomes ftp://ftp.qt-project.org
3723 \li hostname becomes http://hostname
3724 \li /home/user/test.html becomes file:///home/user/test.html
3725 \endlist
3726
3727 In order to be able to handle relative paths, this method takes an optional
3728 \a workingDirectory path. This is especially useful when handling command
3729 line arguments.
3730 If \a workingDirectory is empty, no handling of relative paths will be done.
3731
3732 By default, an input string that looks like a relative path will only be treated
3733 as such if the file actually exists in the given working directory.
3734 If the application can handle files that don't exist yet, it should pass the
3735 flag AssumeLocalFile in \a options.
3736
3737 \since 5.4
3738*/
3739QUrl QUrl::fromUserInput(const QString &userInput, const QString &workingDirectory,
3740 UserInputResolutionOptions options)
3741{
3742 QString trimmedString = userInput.trimmed();
3743
3744 if (trimmedString.isEmpty())
3745 return QUrl();
3746
3747 // Check for IPv6 addresses, since a path starting with ":" is absolute (a resource)
3748 // and IPv6 addresses can start with "c:" too
3749 if (isIp6(text: trimmedString)) {
3750 QUrl url;
3751 url.setHost(host: trimmedString);
3752 url.setScheme(QStringLiteral("http"));
3753 return url;
3754 }
3755
3756 const QUrl url = QUrl(trimmedString, QUrl::TolerantMode);
3757
3758 // Check for a relative path
3759 if (!workingDirectory.isEmpty()) {
3760 const QFileInfo fileInfo(QDir(workingDirectory), userInput);
3761 if (fileInfo.exists())
3762 return QUrl::fromLocalFile(localFile: fileInfo.absoluteFilePath());
3763
3764 // Check both QUrl::isRelative (to detect full URLs) and QDir::isAbsolutePath (since on Windows drive letters can be interpreted as schemes)
3765 if ((options & AssumeLocalFile) && url.isRelative() && !QDir::isAbsolutePath(path: userInput))
3766 return QUrl::fromLocalFile(localFile: fileInfo.absoluteFilePath());
3767 }
3768
3769 // Check first for files, since on Windows drive letters can be interpreted as schemes
3770 if (QDir::isAbsolutePath(path: trimmedString))
3771 return QUrl::fromLocalFile(localFile: trimmedString);
3772
3773 QUrl urlPrepended = QUrl("http://"_L1 + trimmedString, QUrl::TolerantMode);
3774
3775 // Check the most common case of a valid url with a scheme
3776 // We check if the port would be valid by adding the scheme to handle the case host:port
3777 // where the host would be interpreted as the scheme
3778 if (url.isValid()
3779 && !url.scheme().isEmpty()
3780 && urlPrepended.port() == -1)
3781 return adjustFtpPath(url);
3782
3783 // Else, try the prepended one and adjust the scheme from the host name
3784 if (urlPrepended.isValid() && (!urlPrepended.host().isEmpty() || !urlPrepended.path().isEmpty())) {
3785 qsizetype dotIndex = trimmedString.indexOf(c: u'.');
3786 const QStringView hostscheme = QStringView{trimmedString}.left(n: dotIndex);
3787 if (hostscheme.compare(other: ftpScheme(), cs: Qt::CaseInsensitive) == 0)
3788 urlPrepended.setScheme(ftpScheme());
3789 return adjustFtpPath(url: urlPrepended);
3790 }
3791
3792 return QUrl();
3793}
3794
3795QT_END_NAMESPACE
3796

source code of qtbase/src/corelib/io/qurl.cpp